Cisco Workgroup Bridge (WGB) is a very useful tool for the design and deployment of a wireless network because it allows non-wireless devices to gain mobility. WGB provides many details on roaming, security access, etc, that impact deployment scenarios depending on your needs.
In code versions 12.4(25d)JA and later, Cisco introduced a set of commands and changes in order to optimize the use of WGB on high speed roaming environments.
This document covers different aspects of how a WGB works, including roaming algorithm decision points, and how to configure it for the intended usage model.
Cisco recommends that you have knowledge of these topics:
Cisco Wireless LAN solution
Cisco Workgroup Bridge
This document is not restricted to specific software and hardware versions.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
Refer to Cisco Technical Tips Conventions for more information on document conventions.
A WGB is basically an access point (AP) configured to act as a wireless client towards an infrastructure, and to provide Layer 2 connectivity for the devices connected to its ethernet interface.
A typical WGB deployment has these components:
WGB device, normally with at least one radio and one ethernet interface
A wireless infrastructure, normally called root AP, which can be either Autonomous or Unified.
One or more wired client devices connected to the WGB. This document does not cover mixed role scenarios (one radio as WGB, one radio as root on same AP).
There are three main types of WGB:
Cisco WGB: Cisco WGB is any Cisco IOS® - based AP configured as WGB (1130, 1240, 1250, etc). This mode uses the IAPP protocol to inform the network infrastructure of the devices that the WGB has learned on its Ethernet interface. In this case, the Wireless LAN Controller (WLC) or root AP has Layer 2 visibility of the devices "hanging" from the WGB.
Non Cisco WGB: This is a third party device acting as a WGB, connecting one or more wired devices to the wireless infrastructure. These do not support IAPP, and either allow only a single wired device, or provide a MAC address translation mechanism, hiding all their wired clients behind a single 802.11 MAC address. These types of devices need special handling on Address Resolution Protocol (ARP) and DHCP frames if the infrastructure is a WLC due to the security checks and frame handling done on controllers.
Cisco AP configured as "Universal WGB": This is a mode that suppresses the IAPP mechanism, so the WGB can be used towards a either Cisco infrastructure or third party root APs. In this case, the WGB takes the address of its ethernet client, limiting the number of devices behind it to one.
The next section focuses on the scenario of a Cisco WGB used either towards autonomous or WLC infrastructure.
Typical WGB use examples include:
Connecting a wired printer to the network
Different manufacturing deployments, where it is not feasible or practical to run a cable to the wired device
In-vehicle deployments, where the WGB provides connectivity from a car, metro train, etc, to an outdoor wireless network
Wired cameras
Each example has its own requirements on terms of:
Bandwidth needed to support the application that will run on top of the wireless infrastructure
Roaming delay tolerance - How long it takes for the WGB to move from current AP to next one while the device is moving?
Forwarding time tolerance - How many frames are lost on each roaming?
A printer does not move much, so roaming requirements are lower. A train mounted WGB on the other hand, needs fine tuning on the roaming component in order to insure correct behavior while it is moving around.
A video stream can have a large bandwidth requirement, so it needs high wireless data rates. However, a telemetry application might only need a few frames from time to time.
It is important that the requirements are properly defined from the beginning, as they affect not only the configuration of the WGB, but also how the wireless infrastructure has to be designed. For example, AP placement, distance, power levels, enabled rates, etc, all affect roaming characteristics. Therefore, all are a crucial point if high speed roaming is needed.
In general, you must know these details:
What is the needed bandwidth for the application?
What is the roaming delay tolerance?
Can the application handle properly network disconnections? Is there an additional backup mechanism?
Can the application handle packet loss properly? (Even on the best wireless design, you must expect a percentage of packet loss.)
This document does not address the details on how to design a RF environment for high speed roaming/outdoor. Refer to the Outdoor Mesh deployment guide.
For a wireless device, roaming is a very critical part of its functionality.
Basically, roaming means the capability to go from one AP to another, both belonging to the same wireless infrastructure.
As roaming needs a change from the current AP to the next, there is a resultant disconnection or time without service. This disconnection can be small. For example, less than 200ms on voice deployments or much longer, even seconds, if the security needed enforces a full authentication on each roam event.
Roaming is needed so the device can find a new parent with hopefully better signal, and it can continue to access the network infrastructure properly. At the same time, too many roams can cause multiple disconnections or time without service, which affects access. It is important for a mobile device, such as a WGB, to have a good roaming algorithm with enough configuration capabilities to adapt to different RF environments and data needs.
Triggers: Each client implementation has one or more triggers or events, that when met, causes the device to move to another parent AP. Examples: beacon loss (device does not hear anymore the regular beacons from AP), packet retries, signal level, no data received, deauthentication frame received, low data rate in use, etc. The possible triggers can be different from client implementation to another because they are not fully standardized. Simpler devices might have a poor trigger set, which causes bad (sticky clients) or unnecessary roams. The WGB supports all of the previous elements described before.
Scan time: The wireless device (WGB) spends some time searching for potential parents. This normally implies going on different channels, doing active probing or passively listening for APs. As the radio has to scan, this means time that the WGB spends doing something else different from forwarding data. From this scan time, the WGB can build a valid set of parents that can be roamed to.
Parent selection: After scan time, the WGB can check the potential parents, select the best one and trigger the association/authentication process. Sometimes, the decision point can be to remain on the current parent if there is not a significant benefit from a roaming event (remember that roaming too much can be bad).
Association/Authentication: The WGB proceeds to associate to the new AP, which normally covers both 802.11 authentication and association phases, plus completing the security policy configured on the SSID (WPA 2-PSK, CCKM, None, etc.).
Traffic Forwarding Restore: The WGB updates network infrastructure of its known wired clients through IAPP updates after roaming. After this point, the traffic to/from the wired clients to the network resumes.
One important aspect for roaming on mobile devices is what is the security policy that will be implemented on the infrastructure. There are several options, each one with good/bad points. These are the most important ones:
Open—Basically no security. This is the fastest, and simpler of all policies. This has the main problem of not restricting unauthorized access to the infrastructure and no protection against attacks, which limits its usage to very specific scenarios. For example, mines where no external attacks are possible due to sheer nature of the deployment.
MAC address authentication—Basically same level of security as open, as MAC address spoofing is a trivial attack. Not recommended due to the added time to complete the MAC validation, which slows down roaming.
WPA2-PSK—Offers good level of encryption (AES-CCMP), but authentication security depends on the quality of the preshared key. For security measures, a password of minimum 12 characters and random is recommended. Similar to the pre-shared key method, as the key is used on multiple devices, if the key is compromised the password needs to be modified across all equipments. The roaming speed is acceptable, as it is done in 6 frame exchanges, and you can calculate what will be the upper/lower time bounds for it to complete because it does not involve any external equipment (no RADIUS server, etc). In general, this method is the preferred one after balancing problems and benefits.
WPA2 with 802.1x—This improves on the previous method by using a per device/user credential, which can be individually changed. The main problem is that for roaming, this method does not work properly when the device is moving fast, or short roaming times are needed. In general, this uses the same 6 frames plus the EAP exchange which can be between 4 and up. This depends on which EAP type is selected and the certificate sizes. Normally, this takes between 10 to 20 frames, plus the added delay of radius server processing.
WPA2+CCKM—This mechanism offers good protection, uses 802.1x to build the initial authentication, then does a quick exchange of just 2 frames on each roam event. This offers a very quick roaming time. The main problem is that in case of a failed roam, it reverts back on 802.1x. Then, starts using CCKM again after it authenticates. If the application on top of the WGB can tolerate an occasional long roaming time in case of problems, it can be used as the best option versus PSK.
This document does not cover not-recommended technologies that have security issues such as LEAP, WPA-TKIP, WEP, etc.
On the WGB, this is fairly simple to configure. You need SSID definition and the proper encryption on the radio.
dot11 ssid wgbpsk vlan 32 authentication open authentication key-management wpa version 2 wpa-psk ascii YourReallySecurePSK! no ids mfp client interface Dot11Radio0 ssid wgbpsk encryption mode ciphers aes-ccm station-role workgroup-bridge
Your SSID name and pre-shared key have to match your network infrastructure.
It basically builds on top of previous config, with the addition of EAP profiles and authentication method:
dot11 ssid wlan1 authentication open eap eap authentication network-eap eap authentication key-management wpa version 2 dot1x credentials wgb dot1x eap profile eapfast no ids mfp client eap profile eapfast !--- This covers the EAP method type used on your network. method fast ! ! dot1x credentials wgb !--- This is your WGB username/password. username cisco password 7 1511021F0725 interface Dot11Radio0 encryption mode ciphers aes-ccm ssid wlan1
Only one step on top of WPA2 with just one minor change: using CCKM flag on the SSID configuration. This assumes the WLAN is configured for CCKM only on the WLC side:
dot11 ssid wlan1 authentication open eap eap authentication network-eap eap authentication key-management cckm dot1x credentials wgb dot1x eap profile eapfast no ids mfp client
A quick check on the WGB can report the encryption and key management in use, for example, in CCKM:
wgb-1260#sh dot11 associations al Address : 0024.97f2.75a0 Name : lap1140-etsi-1 IP Address : 192.168.40.10 Interface : Dot11Radio 0 Device : LWAPP-Parent Software Version : NONE CCX Version : 5 Client MFP : Off State : EAP-Assoc Parent : - SSID : wlan1 VLAN : 0 Hops to Infra : 0 Association Id : 1 Tunnel Address : 0.0.0.0 Key Mgmt type : CCKM Encryption : AES-CCMP Current Rate : m7.- Capability : WMM ShortHdr ShortSlot Supported Rates : 48.0 54.0 m0. m1. m2. m3. m4. m5. m6. m7. Voice Rates : disabled Bandwidth : 20 MHz Signal Strength : -59 dBm Connected for : 72 seconds Signal to Noise : 41 dB Activity Timeout : 8 seconds Power-save : Off Last Activity : 7 seconds ago Apsd DE AC(s) : NONE Packets Input : 12064 Packets Output : 136 Bytes Input : 2892798 Bytes Output : 19514 Duplicates Rcvd : 87 Data Retries : 8 Decrypt Failed : 0 RTS Retries : 0 MIC Failed : 0 MIC Missing : 0 Packets Redirected: 0 Redirect Filtered: 0
On the WGB, you can modify several parameters that affect roaming algorithm.
By default, the WGB re-transmits a frame 64 times. If it is not properly acknowledged (ACK) by a parent, it assumes that parent is no longer valid, and starts a scan/roaming process. See this one as a "async" roaming trigger because it can be done at any moment that a transmission fails.
The command to configure this, goes inside the dot11 interface, and it takes the following options:
packet retries NUM [drop]
Num: Is between 1 and 128, with a default of 64. A good number for a quick roaming trigger is usually 32. Using a lower number is not advisable on most RF environments.
drop: If not present, the WGB starts a roaming event when maximum retries are reached. When present, the WGB does not start new roaming and uses other triggers, such as beacon loss and signal.
WGB can implement a pro-active signal scan for the current parent and start a new roaming process when the signal falls below an expected level.
This process takes two parameters:
A timer, which wakes up the check process every X seconds
RSSI level, which is used to start a roaming process if the current signal is bellow it.
For example:
in d0 mobile station period 4 threshold 75
The time should not be lower that what the WGB takes to complete an authentication process in order to prevent a "roamming loop" in some conditions or to avoid a too aggressive roaming behavior. In general, it should be tested to see what accomodates the application needs.
For PSK it can be lower than in EAP based methods (typical 2 and 4 for very aggresive applications).
The RSSI level is expresed as a positive integer, although it is basically a normal -dBm measured level. You should use a sightly higher number than the minimum needed to keep your data rate working properly. For example, if your desired minimum rate is 6 mbps, a threshold RSSI of -87 should be sufficient. For a 48 mbps, you need -70 dBm, etc.
Note: This command can also trigger a "roaming by data rate change", which is too aggresive. It must be used together with minimum-rate for good results.
Starting with 12.4(25d)JA, Cisco added a configurable parameter to control when the WGB should trigger a new roaming event, if the current data rate to parent is bellow a given value.
This is helpful to ensure a desired lower bound on speed is kept in order to support video or voice applications.
Before this command was available, the WGB triggered a roaming frequently when the rate was found to be lower than the previous time. Basically on time X+1, if the rate was lower than previous X time, the WGB started a roaming process. On the logs you would see these messages:
*Mar 1 00:36:43.490: %DOT11-4-UPLINK_DOWN: Interface Dot11Radio1, parent lost: Had to lower data rate
This is too aggresive, and normally, the only solution was to configure a single data rate both in WGB and on parent APs.
Now, the recommended way is to always configure this command, whenever a mobile station period command is used:
in d0 mobile station minimum-rate 2.0
With this, the new roaming process is only triggered if the current rate is lower than the configured value. This reduces unnecesary roamings and allows to keep an expected rate value.
Note: The message "Had to lower data rate" is expected to occur even with this config, just that now it should only be seen if WGB was TX at a lower than configured speed, when the mobile station period check time was triggered.
The WGB scans all "country channels" while doing a roaming event. This means that depending on radio domain, you can scan channels 1 to 11 on 2.4 Ghz band, or 1 to 13.
Each scanned channel takes some time. On 802.11bg this is around 10 to 13 ms. On 802.11a, it can be up to 150 ms if channel is DFS enabled (so not probing, just doing passive scan there).
A good optimization is to restrict the scanned channels to use only the ones in service by the infrastructure. This is especially important on 802.11a, as the channel list is large, and the time per channel can be long if DFS is in use.
There are three points to take when designing a channel plan for WGB/Roaming:
For 2.4 GHz band, try to stick to 1/6/11 to minimize side channel interference. Any other channel plan with 4, etc., tends to be difficult to engineer properly from RF point of view, without increasing interference.
Using a single channel setup for all APs is a good idea from scan point of view. This only makes sense if the total number of clients to support is very low, and there are not high bandwidth requirements. This eliminates the radio change time from the scan time. Be aware that few environments can benefit from this option, so use with care.
For 5.0 GHz band, if it is possible by your local regulations, using indoor non-DFS channels(36 to 48) allows faster scan time, as WGB can actively probe each one, instead of doing passive listening for longer time.
The channel plan in use for your deployment might need to accommodate other requirements. Use the general RF design recommendations.
In order to configure the scan channel list:
in d0 mobile station scan 1 6 11
Note: Mobile station only shows up when using the WGB role on the radio.
Note: Make sure your WGB scan list matches your infrastructure channel list. If not, the WGB will not find your available APs.
Starting with 12.4(25a)JA, there are several new commands to optimize recovery timer when a problem is found, which are only available when the AP is in WGB mode.
wgb-1260(config)#workgroup-bridge timeouts ? assoc-response Association Response time-out value auth-response Authentication Response time-out value client-add client-add time-out value eap-timeout EAP Timeout value iapp-refresh IAPP Refresh time-out value
In the case of assoc-response, auth-response, client-add, these indicate how long the WGB will wait for the parent AP to answer, before considering the AP as dead the and trying next candidate. The default values are 5 seconds, which is too long for some applications. The minimum timer is 800 ms and is recommended for most mobile applications.
In eap-timeout, the WGB sets a maximum time to wait, until the full EAP authentication process is completed. This works from a EAP supplicant point of view in order to restart the process if the EAP authenticator is not answering back. The default value is 60 seconds. Be careful to never configure a value that can be lower than the actual time needed to complete a full 802.1x authentication. Normally, setting this to 2 to 4 seconds is correct for most deployments.
For iapp-refresh, the WGB by default generates an IAPP bulk update to the parent AP after roaming in order to inform of the known wired clients. There is a second retransmission after association around 10 seconds later. This timer allows to do a "fast retry" of the IAPP bulk after association in order to overcome the possibility that the first IAPP update was lost due to RF, or encryption keys not yet installed on the parent AP. For fast roaming scenarios, 100ms can be used. However, make sure there is a large number of WGB in use. This increases significantly the total number of IAPP sent to the infrastructure after each roaming.
Example for aggresive values:
workgroup-bridge timeouts eap-timeout 4 workgroup-bridge timeouts iapp-refresh 100 workgroup-bridge timeouts auth-response 800 workgroup-bridge timeouts assoc-response 800 workgroup-bridge timeouts client-add 800
These have been successfully tested on mobile WGB deployment scenarios.
There are other minor changes to take into consideration for WGB deployment scenarios:
Reduce rts retries - rts retries 32. This can save some RF time on aggresive scenarios. Normally this is not needed.
Antenna type: If using a single antenna (no diversity), you should configure the radio to improve general performance:
antenna transmit right-a antenna receive right-a
Antenna diversity is desirable, but not always possible when physically installing antennas on the vehicle. Proper antenna selection is critical for roaming. As little as 2 dB can be a huge difference on general roaming average times.
In order to save some milliseconds, reduce the console logging level to errors only: logging console errors. Do not disable it completely because it can affect negatively the roaming performance on some conditions.
Ideally, use telnet or ssh from the ethernet side to collect debugs or logs. This has a much lower impact on performance in comparison to logging debugs over console: logging monitor debugging.
The command to understand what is occuring for WGB roaming point of view is debug dot11 dot11 0 trace print uplink. This has low impact on the CPU, but do not enable other debug options unless instructed because each one might increment the total roaming time.
Try to use SNTP when possible. This keeps the WGB time on sync, which is extremely helpful for troubleshooting.
MFP can be useful from a security point of view. However, a drawback is that on roaming failure scenarios, the WGB does not accept de-auth frames from the AP parent to trigger a new roaming if the encryption key between both of them has gone wrong for any reason.
On these rare failure scenarios, the WGB can take up to 5 seconds to trigger a new scan, if the current parent can be heard with good RF signal. There is a "catch-all" detection mechanism that WGB can trigger if no valid data frames are received during that time.
By default, the WGB tries to use the client MFP if the SSID has WPA2 AES in use.
It is recommended to disable client MFP if fast recovery times are needed (WGB to react to non-protected deauth frames). This is a compromise between security needs and fast recovery times. The decision depends on what is more important for the deployment scenario.
dot11 ssid wgbpsk no ids mfp client
Refer to the Synchronize IOS Supplicant Clocks and Save Time Setting to NVRAM section of Release Notes for Cisco Aironet Access Points and Bridges for Cisco IOS Release 12.4(21a)JY.
Keep in mind that if using uWGB, the uWGB might never get a chance to do a sntp sync because it is typically associated with the attached MAC address and the uWGB BVI does not have network access. Therefore, in the case of a uWGB, it is recommended to get a good clock sync in NVRAM at deployment at minimum. If the attached enet device has the ability to be an NTP source (as well as updated client via its uWGB connection), then it is possible to consider having the uWGB sntp sync from it as an effective NTP reflection point.
no service pad service timestamps debug datetime msec service timestamps log datetime msec service password-encryption ! hostname wgb-1260 ! logging rate-limit console 9 logging console errors ! clock timezone CET 1 no ip domain lookup ! ! dot11 syslog ! ! dot11 ssid wgbpsk vlan 32 authentication open authentication key-management wpa version 2 wpa-psk ascii 7 060506324F41584B56 no ids mfp client ! ! ! ! ! ! username Cisco password 7 13261E010803 ! ! bridge irb ! ! interface Dot11Radio0 no ip address no ip route-cache ! encryption mode ciphers aes-ccm ! ssid wgbpsk ! antenna transmit right-a antenna receive right-a packet retries 32 station-role workgroup-bridge rts retries 32 mobile station scan 2412 2437 2462 mobile station minimum-rate 6.0 mobile station period 3 threshold 70 bridge-group 1 ! interface GigabitEthernet0 no ip address no ip route-cache duplex auto speed auto no keepalive bridge-group 1 ! interface BVI1 ip address 192.168.32.67 255.255.255.0 no ip route-cache ! ip default-gateway 192.168.32.1 no ip http server no ip http secure-server bridge 1 route ip sntp server 192.168.32.1 clock save interval 1 workgroup-bridge timeouts eap-timeout 4 workgroup-bridge timeouts iapp-refresh 100 workgroup-bridge timeouts auth-response 800 workgroup-bridge timeouts assoc-response 800 workgroup-bridge timeouts client-add 800
In any issues occur, it is important to capture the output of the debug dot11 dot11 0 trace print uplink command as a first step. This provides a good view of what is occurring with the roaming process.
This is an example current parent as candidate:
Sep 27 11:42:38.797: %DOT11-4-UPLINK_DOWN: Interface Dot11Radio0, parent lost: Signal strength too low Sep 27 11:42:38.797: CDD051F1-0 Uplink: Lost AP, Signal strength too low
This is trigger for low signal met. It depends on mobile station period X threshold Y command. First message is always sent to the console, second is part of the uplink debug traces. It is not a problem, but part of the normal WGB process.
Sep 27 11:42:38.798: CDD052C7-0 Uplink: Wait for driver to stop
The Uplink process forces a radio queue purge before starting a channel scan. This step can take from a few milliseconds to several seconds depending on channel utilization and queue depth. Data frames are not timed out. Voice frames have a time comparision done, thus should be dropped faster. Some delay might be observed in noisy enviroments.
Sep 27 11:42:38.798: CDD05371-0 Uplink: Enabling active scan Sep 27 11:42:38.799: CDD05386-0 Uplink: Scanning
This is the actual channel scan taking place. It parks the radio approximately 10 to 13 ms per configured channel.
Sep 27 11:42:38.802: CDD064CD-0 Uplink: Rcvd response from 0021.d835.ade0 channel 1 3695
This is the list of probe responses received. First number is the channel, second is microseconds taken to receive it.
Sep 27 11:42:38.808: CDD078F1-0 Uplink: Compare1 0021.d835.ade0 - Rssi 58dBm, Hops 0, Count 0, load 0 Sep 27 11:42:38.809: CDD07929-0 Uplink: Compare2 0021.d835.cce0 - Rssi 46dBm, Hops 0, Count 0, load 0
Actual comparison done in these details:
Sep 27 11:42:38.809: CDD07BDB-0 Uplink: Same as previous, send null data packet
Parent selection
Sep 27 11:42:38.809: CDD07BF7-0 Uplink: Done Sep 27 11:42:38.808: %DOT11-4-UPLINK_ESTABLISHED: Interface Dot11Radio0, Associated To AP AP1 0021.d835.ade0 [None WPAv2 PSK]Roaming completed.
This is the point where the roaming is "finished". Traffic resumes as soon as IAPP frames are processed by the parent.
Parent Compare information
Sep 27 14:16:47.590: F515B1FF-0 Uplink: Compare1 0021.d835.7620 - Rssi 60dBm, Hops 0, Count 0, load 3 Sep 27 14:16:47.591: F515B238-0 Uplink: Compare2 0021.d835.e8b0 - Rssi 58dBm, Hops 0, Count -1, load 0
The compare1 prints the actual association count -1 (thus WGB itself is not taken in the number) if the “current” AP is still the one WGB is associated, then actual hops and load.
The compare2 prints the differences. This is why it is possible to see a negative number. If test has a higher number than current, you see negative.
Depending on current association count, load, signal difference, mobile threshold value, the WGB might or might not select a new parent.
The comparison is always between two APs, with the selected AP replacing the current for next iteration. Therefore, some of the decisions can be due to RSSI on one loop, or due to other factors on the next test.