Contents
- Troubleshooting
- Installation and Connections
- Debug Commands
- Remote Debug Commands
- AP Console Access
- Cable Modem Serial Port Access From an AP
- Configuration
- Mesh Access Point CLI Commands
- Mesh Access Point Debug Commands
- Defining Mesh Access Point Role
- Backhaul Algorithm
- Passive Beaconing (Anti-Stranding)
- Dynamic Frequency Selection
- DFS in RAP
- DFS in MAP
- Preparation in a DFS Environment
- Monitoring DFS
- Frequency Planning
- Good Signal-to-Noise Ratios
- Access Point Placement
- Check Packet Error Rate
- Bridge Group Name Misconfiguration
- Misconfiguration of the Mesh Access Point IP Address
- Misconfiguration of DHCP
- Identifying the Node Exclusion Algorithm
- Throughput Analysis
Troubleshooting
This chapter describes troubleshooting information and contains the following section:
Installation and Connections
Debug Commands
The following two commands are very helpful to see the messages being exchanged between mesh access points and the controller.
(Cisco Controller) > debug capwap events enable (Cisco Controller) > debug disable-all
You can use the debug command to see the flow of packet exchanges that occur between the mesh access point and the controller. The mesh access point initiates the discovery process. An exchange of credentials takes place during the join phase to authenticate that the mesh access point is allowed to join the mesh network.
Upon a successful join completion, the mesh access point sends a CAPWAP configuration request. The controller responds with a configuration response. When a Configure Response is received from the controller, the mesh access point evaluates each configuration element and then implements them.
Remote Debug Commands
You can log on to the mesh access point console for debugging either through a direct connection to the AP console port or through the remote debug feature on the controller.
To invoke remote debug on the controller, enter the following commands:
(Cisco Controller) > debug ap enable ap-name (Cisco Controller) > debug ap command command ap-name
AP Console Access
AP1500s have a console port. A console cable is not shipped with the mesh access point. For the 1550 series access points, console ports are easily accessible and you need not open the access point box.
The AP1500s have console access security embedded in the code to prevent unauthorized access on the console port and provide enhanced security.
The login ID and password for console access are configured from the controller. You can use the following commands to push the username/password combination to the specified mesh access point or all access points:
<Cisco Controller> config ap username cisco password cisco ?
all Configures the Username/Password for all connected APs.
<Cisco AP> Enter the name of the Cisco AP.
<Cisco Controller> config ap username cisco password cisco all
You must verify whether the username/password pushed from the controller is used as user-id and password on the mesh access point. It is a nonvolatile setting. Once set, a login ID and password are saved in the private configuration of the mesh access point.
Once you have a successful login, the trap is sent to the Cisco Prime Infrastructure. If a user fails to log on three times consecutively, login failure traps are sent to the controller and Cisco Prime Infrastructure.
Caution | A mesh access point must be reset to the factory default settings before moving from one location to another. |
Cable Modem Serial Port Access From an AP
Commands can be sent to the cable modem from the privileged mode of the CLI. Use the command to take a text string and send it to the cable modem UART interface. The cable modem interprets the text string as one of its own commands. The cable modem response is captured and displayed on the Cisco IOS console. Up to 9600 characters are displayed from the cable modem. Any text that is greater than 4800 characters is truncated.
The modem commands are only operational on mesh APs that have devices connected to the UART port originally intended for the cable modem. If the commands are used on a mesh AP that does not have a cable modem (or any other device connected to the UART), the commands are accepted, however, but they do not produce any returned output. No errors are explicitly flagged.
Configuration
Enter the following command from the privileged mode of the MAP:
AP#send cmodem timeout-value modem-command
The modem command is any command or text to send to the cable modem. The range of timeout value is 1 to 300 seconds. However, if the captured data equals 9600 characters, any text beyond that is truncated and the response, irrespective of the timeout value and is immediately displayed on the AP console.
Enabling the Cable Modem Console Port
Note | For the AP1572EC, AP1572IC, AP1552C, and AP1552CU, the cable modem must be enabled. |
-
Enable the cable modem console port through SNMP by entering this command to the IP address of the cable modem: snmpset –c private IP_ADDRESS cmConsoleMode.0 i N
Using the OID, enter this command:snmpset –c private IP_ADDRESS 1.3.6.1.4.1.1429.77.1.4.7.0 i N
Where IP_ADDRESS is any IPv4 address and N is an integer, 2 to enable read-write, 1 for read-only, or 0 to disable.Example: snmpset -c private 209.165.200.224 cmConsoleMode.0 i 2
-
Enable the cable modem console port through the configuration file. The configuration file (with a .cm extension) is loaded into the cable modem head end. It is pushed to the cable modem as part of the join process. Enter the following line to the cable modem configuration file: SA-CM-MIB::cmConsoleMode.0 = INTEGER: readWrite(2)
Using the OID, enter this line:SA-CM-MIB::cmConsoleMode.0 = INTEGER: readWrite(2)
Resetting the AP1572xC/AP1552C Through the Cable Modem
An AP can be reset by entering an SNMP command to the Cable Modem, which resides inside the access point. For this feature to work, you must enable the cable modem console port.
Snmpset -v2c -c public IP ADDRESS 1.3.6.1.4.1.1429.77.1.3.17.0 i 1Where the IP ADDRESS is the IPv4 address of the cable modem.
Mesh Access Point CLI Commands
You can enter these commands directly on the mesh access point using the AP console port or you can use the remote debug feature from the controller:
Mesh Access Point Debug Commands
You can enter these commands directly on the mesh access point using the AP console port or you can use the remote debug feature from the controller.
debug mesh ethernet config—Debugs access and trunk port configuration associated with VLAN tagging.
debug mesh ethernet registration—Debugs the VLAN registration protocol. This command is associated with VLAN tagging.
debug mesh forwarding table—Debugs the forwarding table containing bridge groups.
debugs mesh forwarding packet bridge-group—Debugs the bridge group configuration.
Defining Mesh Access Point Role
By default, AP1500s are shipped with a radio role set to MAP. You must reconfigure a mesh access point to act as a RAP.
Backhaul Algorithm
A backhaul is used to create only the wireless connection between mesh access points.
The backhaul interface by default is 802.11a. You cannot change the backhaul interface to 802.11b/g.
The "auto" data rate is selected by default for AP1500s.
The backhaul algorithm has been designed to fight against stranded mesh access point conditions. This algorithm also adds a high-level of resiliency for each mesh node.
The algorithm can be summarized as follows:
-
A MAP always sets the Ethernet port as the primary backhaul if it is UP; otherwise, it is the 802.11a radio (this feature gives the network administrator the ability to configure it as a RAP the first time and recover it in-house). For fast convergence of the network, we recommend that you do not connect any Ethernet device to the MAP for its initial joining to the mesh network.
-
A MAP failing to connect to a WLAN controller on an Ethernet port that is UP, sets the 802.11a radio as the primary backhaul. Failing to find a neighbor or failing to connect to a WLAN controller via any neighbor on the 802.11a radio causes the primary backhaul to be UP on the Ethernet port again. A MAP gives preference to the parent which has the same BGN.
-
A MAP connected to a controller over an Ethernet port does not build a mesh topology (unlike a RAP).
-
A RAP always sets the Ethernet port as the primary backhaul.
-
If the Ethernet port on a RAP is DOWN, or a RAP fails to connect to a controller on an Ethernet port that is UP, the 802.11a radio is set as the primary backhaul. Failing to find a neighbor or failing to connect to a controller via any neighbor on the 802.11a radio makes the RAP go to the SCAN state after 15 minutes and starts with the Ethernet port first.
Keeping the roles of mesh nodes distinct using the above algorithm greatly helps to avoid a mesh access point from being in an unknown state and becoming stranded in a live network.
Passive Beaconing (Anti-Stranding)
When enabled, passive beaconing allows a stranded mesh access point to broadcast its debug messages over-the-air using a 802.11b/g radio. A neighboring mesh access point that is listening to the stranded mesh access point and has a connection to a controller, can pass those messages to the controller over CAPWAP. Passive beaconing prevents a mesh access point that has no wired connection from being stranded.
Debug logs can also be sent as distress beacons on a nonbackhaul radio so that a neighboring mesh access point can be dedicated to listen for the beacons.
The following steps are automatically initiated at the controller when a mesh access point loses its connection to the controller:
You only have to know the MAC address of the stranded AP to make use of this feature.
A mesh access point is considered stranded if it goes through a lonely timer reboot. When the lonely timer reboot is triggered, the mesh access point, which is now stranded, enables passive beaconing, the anti-stranding feature.
This feature can be divided into three parts:
Deployed mesh access points constantly look for stranded mesh access points. Periodically, mesh access points send a list of stranded mesh access points and SNR information to the controller. The controller maintains a list of the stranded mesh access points within its network.
When the debug mesh astools troubleshoot mac-addr start command is entered, the controller runs through the list to find the MAC address of the stranded mesh access point.
A message is sent to the best neighbor to start listening to the stranded access point. The listening mesh access point gets the distress beacons from the stranded mesh access point and sends it to the controller.
Once a mesh access point takes the role of a listener, it does not purge the stranded mesh access point from its internal list until it stops listening to the stranded mesh access point. While a stranded mesh access point is being debugged, if a neighbor of that mesh access point reports a better SNR to the controller than the current listener by some percentage, then the listener of the stranded mesh access point is changed to the new listener (with better SNR) immediately.
End-user commands are as follows:
config mesh astools [enable | disable]—Enables or disables the astools on the mesh access points. If disabled, APs no longer sends a stranded AP list to the controller.
show mesh astools stats—Shows the list of stranded APs and their listeners if they have any.
debug mesh astools troubleshoot mac-addr start—Sends a message to the best neighbor of the mac-addr to start listening.
debug mesh astools troubleshoot mac-addr stop—Sends a message to the best neighbor of the mac-addr to stop listening.
clear mesh stranded [all | mac of b/g radio]—Clears stranded AP entries.
The controller console is swamped with debug messages from stranded APs for 30 minutes.
Dynamic Frequency Selection
Previously, devices employing radar operated in frequency subbands without other competing services. However, controlling regulatory bodies are attempting to open and share these bands with new services like wireless mesh LANs (IEEE 802.11).
To protect existing radar services, the regulatory bodies require that devices wishing to share the newly opened frequency subband behave in accordance with the Dynamic Frequency Selection (DFS) protocol. DFS dictates that to be compliant, a radio device must be capable of detecting the presence of radar signals. When a radio detects a radar signal, it is required to stop transmitting to for at least 30 minutes to protect that service. The radio then selects a different channel to transmit on but only after monitoring it. If no radar is detected on the projected channel for at least one minute, then the new radio service device may begin transmissions on that channel.
The AP performs a DFS scan on the new DFS channel for 60 seconds. However, if a neighboring AP is already using that new DFS channel, the AP does not perform the DFS scan.
The process for a radio to detect and identify a radar signal is a complicated task that sometimes leads to incorrect detects. Incorrect radar detections can occur due to a large number of factors, including due to uncertainties of the RF environment and the ability of the access point to reliably detect actual on-channel radar.
The 802.11h standard addresses DFS and Transmit Power Control (TPC) as it relates to the 5-GHz band. Use DFS to avoid interference with radar and TPC to avoid interference with satellite feeder links.
Note | DFS is mandatory in the USA for 5250 to 5350 and 5470 to 5725 frequency bands. DFS and TPC are mandatory for these same bands in Europe. |
- DFS in RAP
- DFS in MAP
- Preparation in a DFS Environment
- Monitoring DFS
- Frequency Planning
- Good Signal-to-Noise Ratios
- Access Point Placement
- Check Packet Error Rate
DFS in RAP
The RAP performs the following steps as a response to radar detection:
The RAP sends a message to the controller that the channel is infected with radar. The channel is marked as infected on the RAP and on the controller.
The RAP blocks the channel for 30 minutes. This 30-minute period is called the nonoccupancy period.
The controller sends a TRAP, which indicates that the radar has been detected on the channel. A TRAP remains until the nonoccupancy period expires.
The RAP has 10 seconds to move away from the channel. This period is called the channel move time, which is defined as the time for the system to clear the channel and is measured from the end of the radar burst to the end of the final transmission on the channel.
The RAP enters the quiet mode. In the quiet mode, the RAP stops data transmissions. Beacons are still generated and probe responses are still delivered. The quiet mode exists until the channel move time is over (10 seconds).
The controller picks up a new random channel and sends the channel information to the RAP.
The RAP receives the new channel information and sends channel change frames (unicast, encrypted) to the MAP, and each MAP sends the same information to its lower children down the sector. Each mesh access point sends the channel change frames once every 100 msecs for a total of five times.
The RAP tunes to the new channel and enters into the silent mode. During the silent mode, only the receiver is ON. The RAP keeps scanning the new channel for any radar presence for 60 seconds. This process is called channel availability check (CAC).
The MAP tunes to the new channel and enters into the silent mode. During the silent mode, only the receiver is ON. The MAP keeps scanning the new channel for any radar presence for 60 seconds.
If radar is not detected, the RAP resumes full functionality on this new channel and the whole sector tunes to this new channel.
DFS in MAP
The MAP performs the following steps as a response to radar detection:
The MAP sends a radar seen indication to the parent and ultimately to the RAP indicating that the channel is infected. The RAP sends this message to the controller. The message appears to be coming from the RAP. The MAP, RAP, and controller mark the channel as infected for 30 minutes.
The MAP blocks the channel for 30 minutes. This 30-minute period is called the nonoccupancy period.
The controller sends a TRAP, which indicates that the radar has been detected on the channel. The TRAP remains until the nonoccupancy period expires.
The MAP has 10 seconds to move away from the channel. This is called the channel move time, which is defined as the time for the system to clear the channel and is measured from the end of the radar burst to the end of the final transmission on the channel.
The MAP enters the quiet mode. In the quiet mode, the MAP stops data transmissions. Beacons are still generated and probe responses are still delivered. The quiet mode exists until the channel move time is over (10 seconds).
The controller picks up a new random channel and sends the channel to the RAP.
The RAP receives the new channel information and sends channel change frames (unicast, encrypted) to a MAP, and each MAP sends the same information to its lower children down the sector. Each mesh access point sends the channel change frames once every 100 msecs for a total of five times.
Each mesh access point tunes to the new channel and enters into the silent mode. During the silent mode, only the receiver is ON. There is no packet transmission. An AP keeps scanning the new channel for any radar presence for 60 seconds. This process is called the channel availability check (CAC). The MAP should not disconnect from the controller. The network should remain stable during this one-minute period.
DFS functionality allows a MAP that detects a radar signal to transmit that up to the RAP, which then acts as if it has experienced radar and moves the sector. This process is called the coordinated channel change. This functionally can be turned on or off on the controller. The coordinated channel change is enabled by default.
To enable DFS, enter the following command:
(Cisco Controller) > config mesh full-sector-dfs enable
To verify that DFS is enabled on the network, enter the following command:
(Cisco Controller) > show network summary
Note | Ensure that none of your mesh access points are using a default BGN. |
Note | A repeated radar event on the MAP (radar triggers once, and then almost immediately again), causes the MAP to disconnect. |
Preparation in a DFS Environment
This section describes how to prepare in a DFS environment:
To verify that your controller is set to the correct country domain, enter the following command:
(Cisco Controller) > show country
To check the mesh access point country and the channel setting on the controller, enter the following command:
(Cisco Controller)> show ap config 802.11a ap-name
To identify channels available for mesh, enter the following command:
(Cisco Controller)> show ap config 802.11a ap-name
Look for the allowed channel list.
Allowed Channel List....................... 100,104,108,112,116,120,124, ......................................... 128,132,136,140
To identify channels available for mesh on the AP console (or use remote debug from the controller, enter the following command:
ap1520-rap # show mesh channels HW: Dot11Radio1, Channels: 100, 104, 108, 112, 116, 120, 124, 128, 132, 136, 140
An asterisk next to a channel indicates that radar has been seen on the channel.
To invoke remote debug, enter the following commands:
(Cisco Controller) > debug ap enable ap-name (Cisco Controller) > debug ap command command ap-name
Debug commands to see radar detection and past radar detections on the DFS channel are as follows:
show mesh dfs channel channel-number show mesh dfs history
Information similar to the following appears.
ap1520-rap # show mesh dfs channel 132
Channel 132 is available Time elapsed since radar last detected: 0 day(s), 7 hour(s), 6 minute(s), 51 second(s).
The RAP should be run through the channels to determine whether there is active radar on each of the channels.
ap1520-rap # show mesh dfs channel 132 Radar detected on channel 132, channel becomes unusable (Time Elapsed: 0 day(s), 7 hour(s), 7 minute(s), 11 second(s)). Channel is set to 100 (Time Elapsed: 0 day(s), 7 hour(s), 7 minute(s), 11 second(s)). Radar detected on channel 116, channel becomes unusable (Time Elapsed: 0 day(s), 7 hour(s), 6 minute(s), 42 second(s)). Channel is set to 64 (Time Elapsed: 0 day(s), 7 hour(s), 6 minute(s), 42 second(s)). Channel 132 becomes usable (Time Elapsed: 0 day(s), 6 hour(s), 37 minute(s), 10 second(s)). Channel 116 becomes usable (Time Elapsed: 0 day(s), 6 hour(s), 36 minute(s), 42 second(s)).
Monitoring DFS
The DFS history should be run every morning or more frequently to detect the radar. This information does not get erased and is stored on the mesh access point flash. Therefore, you only need to match the times.
ap1520-rap # show controller dot11Radio 1
Information similar to the following appears:
interface Dot11Radio1 Radio Hammer 5, Base Address 001c.0e6c.9c00, BBlock version 0.00, Software version 0.05.30 Serial number: FOC11174XCW Number of supported simultaneous BSSID on Dot11Radio1: 16 Carrier Set: ETSI (OFDM) (EU) (-E) Uniform Spreading Required: Yes Current Frequency: 5540 MHz Channel 108 (DFS enabled) Allowed Frequencies: *5500(100) *5520(104) *5540(108) *5560(112) *5580(116) *560 0(120) *5620(124) *5640(128) *5660(132) *5680(136) *5700(140) * = May only be selected by Dynamic Frequency Selection (DFS) Listen Frequencies: 5180(36) 5200(40) 5220(44) 5240(48) 5260(52) 5280(56) 5300(6 0) 5320(64) 5500(100) 5520(104) 5540(108) 5560(112) 5580(116) 5660(132) 5680(136 ) 5700(140) 5745(149) 5765(153) 5785(157) 5805(161) 5825(165) 4950(20) 4955(21) 4960(22) 4965(23) 4970(24) 4975(25) 4980(26)
Note | An asterisk indicates that this channel has DFS enabled. |
Frequency Planning
Use alternate adjacent channels in adjacent sectors. If you have two RAPs deployed at the same location, you must leave one channel in between.
Weather radars operate within the 5600- to 5650-MHz band, which means that channels 124 and 128 might be affected, but also channels 120 and 132 might suffer from weather radar activity.
If the mesh access point does detect radar, the controller and the mesh access point both will retain the channel as the configured channel. The controller retains it in volatile memory associated with the mesh access point, and the mesh access point has it stored in its flash as configuration. After the 30 minute quiet period, the controller returns the mesh access point to the static value, regardless of whether the mesh access point has been configured with a new channel or not. In order to overcome this, configure the mesh access point with a new channel, and reboot the mesh access point.
Once radar is reliably detected on a channel, that channel, and the two surrounding channels, should be added to the RRM exclusion list, as follows:
(Cisco Controller) > config advanced 802.11a channel delete channel
A mesh access point goes to a new channel that is picked by RRM, and it does not consider excluded channels.
If a radar is detected on channel 124, for instance, channels 120, 124, and 128 should be added to the exclusion list. In addition, do not configure RAP to operate on those channels.
Good Signal-to-Noise Ratios
For European installations, the minimum recommendation is increased to 20 dB of signal-to-noise ratio (SNR). The extra dBs are used to mitigate the effects of radar interference with packet reception, which is not observed in non-DFS environments.
Access Point Placement
Collocated mesh access points should have a minimum of 10 feet (3.048 meters) of vertical separation or 100 (30.48 meters) feet of horizontal separation.
Check Packet Error Rate
Mesh access points that have an high error rate, greater than 1 percent, should have mitigation applied to them, by changing the channels for noise and interference, adding additional mesh access points in the transmission path, moving the mesh access points to different sectors, or adding additional mesh access points.
Bridge Group Name Misconfiguration
A mesh access point can be wrongly provisioned with a bridgegroupname and placed in a group other than it was intended. Depending on the network design, this mesh access point might or might not be able to reach out and find its correct sector or tree. If it cannot reach a compatible sector, the mesh access point can become stranded.
To recover a stranded mesh access point, the concept of default bridgegroupname has been introduced in the software. When a mesh access point is unable to connect to any other mesh access point with its configured bridgegroupname, it attempts to connect with the bridgegroupname of default.
The algorithm of detecting this strand condition and recovery is as follows:
Passively scans and finds all neighbor nodes regardless of their bridgegroupname.
The mesh access point attempts to connect to the neighbors heard with my own bridgegroupname using AWPP.
If Step 2 fails, attempts to connect with default bridgegroupname using AWPP.
For each failed attempt in Step 3, it adds the neighbor to an exclusion list and attempts to connect the next best neighbor.
If the AP fails to connect with all neighbors in Step 4, it reboots the mesh access point.
If connected with a default bridgegroupname for 15 minutes, the mesh access point goes into a scan state.
When an mesh access point is able to connect with the default bridgegroupname, the parent node reports the mesh access point as a default child/node/neighbor entry on the controller, so that a network administrator is Cisco Prime Infrastructure. Such a mesh access point behaves as a normal (nonmesh) access point and accepts any client, other mesh nodes as its children, and it passes any data traffic through.
Note | Do not confuse an unassigned BGN (null value) with DEFAULT, which is a mode that the access point uses to connect when it cannot find its own BGN. |
To check the current state of a mesh access point’s BGN, enter the following command:
(Cisco Controller)> show mesh path Map3:5f:ff:60 00:0B:85:5F:FA:60 state UPDATED NEIGH PARENT DEFAULT (106B), snrUp 48, snrDown 48, linkSnr 49 00:0B:85:5F:FB:10 state UPDATED NEIGH PARENT BEACON (86B) snrUp 72, snrDown 63, linkSrn 57 00:0B:85:5F:FA:60 is RAP
To check the current state of a mesh access point’s BGN, check the neighbor information for the mesh access point (GUI) as follows:
Choose Wireless > All APs > AP Name > Neighbor info .
Misconfiguration of the Mesh Access Point IP Address
Although most Layer 3 networks are deployed using DHCP IP address management, some network administrators might prefer the manual IP address management and allocating IP addresses statically to each mesh node. Manual mesh access point IP address management can be a nightmare for large networks, but it might make sense in small to medium size networks (such as 10 to 100 mesh nodes) because the number of mesh nodes are relatively small compared to client hosts.
Statically configuring the IP address on a mesh node has the possibility of putting a MAP on a wrong network, such as a subnet or VLAN. This mistake could prevent a mesh access point from successfully resolving the IP gateway and failing to discover a WLAN controller. In such a scenario, the mesh access point falls back to its DHCP mechanism and automatically attempts to find a DHCP server and obtains an IP address from it. This fallback mechanism prevents a mesh node from being potentially stranded from a wrongly configured static IP address and allows it to obtain a correct address from a DHCP server on the network.
When you are manually allocating IP addresses, we recommend that you make IP addressing changes from the furthest mesh access point child first and then work your way back to the RAP. This recommendation also applies if you relocate equipment. For example, if you uninstall a mesh access point and redeploy it in another physical location of the mesh network that has a different addressed subnet.
Another option is to take a controller in Layer 2 mode with a RAP to the location with the misconfigured MAP. Set the bridge group name on the RAP to match the MAP that needs the configuration change. Add the MAP’s MAC address to the controller. When the misconfigured MAP comes up in the mesh access point summary detail, configure it with an IP address.
Misconfiguration of DHCP
Despite the DHCP fallback mechanism, there is still a possibility that a mesh access point can become stranded, if any of the following conditions exist:
There is a DHCP server on the network, but it does not offer an IP address to the AP, or if it gives a wrong IP address to the AP (for example, on a wrong VLAN or subnet).
These conditions can strand a mesh access point that is configured with or without a wrong static IP address or with DHCP. Therefore, you must ensure that when a mesh access point is unable to connect after exhausting all DHCP discovery attempts or DHCP retry counts or IP gateway resolution retry counts, it attempts to find a controller in Layer 2 mode. In other words, a mesh access point attempts to discover a controller in Layer 3 mode first and in this mode, attempts with both static IP (if configured) or DHCP (if possible). The AP then attempts to discover a controller in Layer 2 mode. After finishing a number of Layer 3 and Layer 2 mode attempts, the mesh access point changes its parent node and re-attempts DHCP discovery. Additionally, the software exclusion-lists notes the parent node through which it was unable to obtain the correct IP address.
Identifying the Node Exclusion Algorithm
Depending on the mesh network design, a node might find another node “best” according to its routing metric (even recursively true), yet it is unable to provide the node with a connection to the correct controller or correct network. It is the typical honeypot access point scenario caused by either misplacement, provisioning, design of the network, or by the dynamic nature of an RF environment exhibiting conditions that optimize the AWPP routing metric for a particular link in a persistent or transient manner. Such conditions are generally difficult to recover from in most networks and could blackhole or sinkhole a node completely, taking it out from the network. Possible symptoms include, but are not limited to the following:
A node connects to the honeypot but cannot resolve the IP gateway when configured with the static IP address, or cannot obtain the correct IP address from the DHCP server, or cannot connect to a WLAN controller.
A node ping-pongs between a few honeypots or circles between many honeypots (in worst-case scenarios).
Cisco mesh software resolves this difficult scenario by using a sophisticated node exclusion-listing algorithm. This node exclusion-listing algorithm uses an exponential backoff and advance technique much like the TCP sliding window or 802.11 MAC.
The basic idea relies on the following five steps:
Honeypot detection—The honeypots are first detected via the following steps:
Honeypot conviction—When a honeypot is detected, it is placed in a exclusion-list database with its conviction period to remain on the list. The default is 32 minutes. Other nodes are then attempted as parents in the following order, falling back to the next, upon failing the current mechanism:
Nonhoneypot credit—It is often possible that a node is not a really a honeypot, but appears to be due to some transient back-end condition, such as the following:
The DHCP server is either not up-and-running yet, has failed temporarily, or requires a reboot.
The WLAN controller is either not up-and-running yet, has failed temporarily, or requires a reboot.
The Ethernet cable on the RAP was accidentally disconnected.
Such nonhoneypots must be credited properly from their serving times so that a node can come back to them as soon as possible.
Honeypot expiration—Upon expiration, an exclusion-list node must be removed from the exclusion-list database and return to a normal state for future consideration by AWPP.
Honeypot reporting—Honeypots are reported to the controller via an LWAPP mesh neighbor message to the controller, which shows these on the Bridging Information page. A message is also displayed the first-time an exclusion-listed neighbor is seen. In a subsequent software release, an SNMP trap is generated on the controller for this condition so that Cisco Prime Infrastructure can record the occurrence.
Figure 8. Excluded Neighbor
Because many nodes might be attempting to join or rejoin the network after an expected or unexpected event, a hold-off time of 16 minutes is implemented, which means that no nodes are exclusion-listed during this period of time after system initialization.
This exponential backoff and advance algorithm is unique and has the following properties:
It allows a node to correctly identify the parent nodes whether it is a true honeypot or is just experiencing temporary outage conditions.
It credits the good parent nodes according to the time it has enabled a node to stay connected with the network. The crediting requires less and less time to bring the exclusion-list conviction period to be very low for real transient conditions and not so low for transient to moderate outages.
It has a built-in hysteresis for encountering the initial condition issue where many nodes try to discover each other only to find that those nodes are not really meant to be in the same network.
It has a built-in memory for nodes that can appear as neighbors sporadically so they are not accidentally considered as parents if they were, or are supposed to be, on the exclusion-list database.
The node exclusion-listing algorithm guards the mesh network against serious stranding. It integrates into AWPP in such a way that a node can quickly reconverge and find the correct network.
Throughput Analysis
Throughput depends on packet error rate and hop count.
Capacity and throughput are orthogonal concepts. Throughput is one user's experience at node N and the total area capacity is calculated over the entire sector of N-nodes and is based on the number of ingress and egress RAP, assuming separate noninterfering channels.
For example, 4 RAPs at 10 Mbps each deliver 40 Mbps total capacity. So, one user at 2 hops out, logically under each RAP, could get 5 Mbps each of TPUT, but consume 40 Mbps of the backhaul capacity.
With the Cisco Mesh solution, the per-hop latency is less than 10 msecs, and the typical latency numbers per hop range from 1 to 3 msecs. Overall jitter is also less than 3 msecs.
Throughput depends on the type of traffic being passed through the network: User Datagram Protocol (UDP) or Transmission Control Protocol (TCP). UDP sends a packet over Ethernet with a source and destination address and a UDP protocol header. It does not expect an acknowledgement (ACK). There is no assurance that the packet is delivered at the application layer.
TCP is similar to UDP but it is a reliable packet delivery mechanism. There are packet acknowledgments and a sliding window technique is used to allow the sender to transmit multiple packets before waiting for an ACK. There is a maximum amount of data the client transmits (called a TCP socket buffer window) before it stops sending data. Sequence numbers track packets sent and ensure that they arrive in the correct order. TCP uses cumulative ACKs and the receiver reports how much of the current stream has been received. An ACK might cover any number of packets, up to the TCP window size.
TCP uses slow start and multiplicative decrease to respond to network congestion or packet loss. When a packet is lost, the TCP window is cut in half and the back-off retransmission timer is increased exponentially. Wireless is subject to packet loss due to interference issues and TCP reacts to this packet loss. A slow start recovery algorithm is also used to avoid swamping a connection when recovering from packet loss. The effect of these algorithms in a lossy network environment is to lessen the overall throughput of a traffic stream.
By default, the maximum segment size (MSS) of TCP is 1460 bytes, which results in a 1500-byte IP datagram. TCP fragments any data packet that is larger than 1460 bytes, which can cause at least a 30-percent throughput drop. In addition, the controller encapsulates IP datagrams in the 48-byte CAPWAP tunnel header as shown in Figure 1. Any data packet that is longer than 1394 bytes is also fragmented by the controller, which results in up to a 15-percent throughput decrease.