Introduction
This document describes the use of TITAN, its configuration, and troubleshooting in CURWB deployments.
TITAN Basics
TITAN is a critical feature that offers high availability with hardware redundancy in CURWB deployments. It can be configured throughout various features of the fixed part of the fluidity network. The most common applications are in enabling TITAN on the Core Network Global Gateways, Local Mesh End Gateways, and Onboard Vehicle radios. It works on both layer 2 and layer 3 networks.
When enabled, TITAN provides fast failover from a Primary device to a Secondary device, in less than 500 milliseconds. The Secondary device immediately resumes CURWB MPLS communications.
The example seen here shows TITAN failover on all three conditions.
- Core Network Global Gateways,
- Local Mesh End Gateways, and
- Onboard Vehicle radios
How does TITAN work?
To fully comprehend the concept of Titan, it is essential to become familiar with Autotap. It is a network-loop prevention mechanism that allows CUWRB devices to detect connections and allow only a dedicated ingress/egress route to and from the Mesh End or network core.
Radios with the same passphrase, connected to the same network switch on the same broadcast domain, act as a single unit with multiple antennas.
CURWB Mesh Protocol detects wired connections among radios, with routes being built automatically. The result is like having a single AP with multiple wireless interfaces.
The AutoTap functionality prevents network loops in such configurations. Only the radio elected as Primary (lowest numerical Mesh ID) in a physically connected group publishes MAC address information. Traffic is seen coming only from the radio elected as the Primary radio of the connected group.
Fixed Infrastructure
The user sets two Mesh End units with the same configuration and connect to the same switch. Those devices share information to elect the Primary and the other unit is on standby. When failure occurs, the Standby unit takes over in 500 ms and connect all the Mesh Points back to the system. For fixed networks, TITAN can only be enabled on Mesh End units and the Points automatically establish a connection with the End that took over.
Vehicles for Mobility
The process is the same as in a fixed network, the units must be on the same switch with the same configuration. The algorithm sets one as the Primary unit and the other as the secondary. In case the primary fails, the secondary unit takes over in 500 ms and establish a connection with the closed trackside unit. The only difference with mobility is, that TITAN can be enabled on Mesh Point units. In that case, the Fluidity feature supersedes the operation mode of the radio.
Trackside Radio
When the radio can’t communicate with the backbone network, the system forces the vehicle(s) to connect to the closest trackside as an immediate response to the failure. It is the same process as for fixed networks but with the possibility of more than one active standby trackside. On the trackside system, the backup is not a radio on standby mode but a fully operational and active one that can cover for the failure.
Gateway connected to the corporate network
Just as Mesh Ends on a fixed network, the gateways (FM1000 and FM10000) work together to elect a Primary, and the backup takes over on a failure.
Primary Election
All CURWB units connected to the same wired broadcast domain and configured with the same passphrase perform a distributed primary election process every few seconds. The Primary unit constitutes an edge point of the CURWB MPLS network, that is, a device where the user traffic can enter or leave the mesh. Secondary units act as MPLS relay points. For each neighbor, the algorithm computes a precedence value based on the role of the unit (mesh-end or mesh-point) and its mesh-ID. Mesh-ends are assigned a higher priority than mesh-points and, among the same priority, the unit having the lowest mesh-ID is preferred. The election mechanism relies on a dedicated signaling protocol that constantly runs in the network, and it guarantees that all units elect the same Primary.
Mesh End failover
During normal operation, the Primary and Secondary mesh ends continuously to communicate with each other about their status and to exchange network reachability information. In particular, the Primary periodically sends updates to the Secondary regarding its internal forwarding table and multicast routes.
Configuration
In the basic TITAN configuration setup, a deployment would need two gateways (Mesh End); one primary and other secondary.
Both the primary and secondary pieces of hardware must have these TITAN configurations.
configure mpls fastfail status enabled
configure mpls fastfail timeout 150
config mpls unicast-flood enabled
config mpls arp-unicast disabled
config spanning-tree link-guard 40
config arp gratuitous enabled
configure arp gratuitous delay 150
In Layer 3 configurations if HA is required on each mesh end, then we would need two mesh ends on which the previous TITAN configuration needs to be executed.
While configuring TITAN on the vehicle radio, first the vehicle needs to have 2 radios on it. In the event of a failure on the primary, the secondary takes over the communication. In this scenario, the vehicle radios and the mesh end of the fluidity network need to have the TITAN configuration.
Test Scenario
Our current network topology includes seven radios. Within this setup, the Mesh End radios have their wireless interfaces deactivated. Their role is confined to serving as gateways, rather than functioning as part of the Trackside radio system. The primary Mesh End unit is assigned the IP address 10.122.136.50 and the secondary unit with the IP address 10.122.136.47.
We have 3 Trackside radios (10.122.136.9, 10.122.136.16 and 10.122.136.15). Trackside radio with the IP address 10.122.136.9 is hardwired into the core network infrastructure. This pivotal radio also extends a backhaul link to a pair of trailer radios with IP 10.122.136.15 and 10.122.136.16. These fixed infrastructure backhaul links are operating on the 5240 MHz frequency band. Collectively, the three radio provides wireless coverage to the mobile vehicle operating on a 5180 MHz frequency with the IP address 10.122.136.13).
The mobile vehicle is equipped with two radios with IP address 10.122.136.13 as the primary and 10.122.136.14 as the secondary. Both radios are interconnected through a single switch. The secondary radio is not shown here.
Mesh End failover
Step 1: Both Primary and Secondary mesh ends are connected to the network and are active. We can see the radio with lower Mesh ID acting as mesh end.
Step 2: When the primary goes does, the secondary Mesh End takes over and acts as the mesh end for the entire network. Note that the failed primary mesh end is now missing from the list of infrastructure radios.
Step 3: The failed primary radio is now back and operational. However, it waits for the preemption delay to learn the network topology
Step 4: Once the preemption timer is reached, the Mesh ID 5.246.226.200 takes over the role of primary and radio with mesh ID 5.246.227.8 again becomes the secondary.
Testing failover on the Vehicle Radio
In this lab network, we have a fluidity network with one vehicle connected to the trackside. The vehicle has two radios with IP 10.122.136.13 – Mesh ID 5.66.194.36 (P) and 10.122.136.14 – Mesh ID 5.246.2.120 (S).
Step 1: Both Primary and Secondary vehicle radios are online. We can see the radio with a lower Mesh ID acting as the primary radio and the other as secondary. Based on the wireless quality both the primary and secondary radio can communicate with the trackside radio. But all downstream communication to the onboard network always goes through the primary radio. With TITAN secondary vehicle radio becomes Primary during failure within 500ms.
In this screenshot, the MPLS tunnel can be seen from the mesh end to the vehicle radios.
Step 2: As we shut down the primary radio10.122.136.13 it fails over to the secondary and now 10.122.136.14 becomes the primary.
Step 3: Failed Primary onboard radio on the vehicle is powered back and operational. However, although this radio connects to the network, it waits for the preemption delay and not actively participate in the fluidity network.
As seen in this screenshot, 5.66.194.36 came back online but still, it is acting as secondary during the preemption delay and 5.246.2.120 is still managing the communication. The MPLS tunnel also shows that 5.246.2.120 is communicating with the trackside radio.
Troubleshooting TITAN
- During TITAN configuration all configurations need to be identical on all required radios.
- Based on the size of the deployment, the preemption delay might need to be increased. This is to ensure that when the failed unit becomes operational, it does not take over the role too early before learning the topology.
- Configuring the fast fail timeout too small can create an unstable network. A value of 150 ms could be used in most deployments.