Introduction
This document provides a brief overview of basic configuration guidelines for large-scale wireless deployments such as the AireOS Wireless LAN Controller (WLC) with RADIUS with the Cisco Identity Services Engine (ISE) or the Cisco Secure Access Control Server (ACS). This document references other documents with greater technical detail.
Symptoms Observed
Typically university environments encounter this Authentication, Authorization, and Accounting (AAA) meltdown state. This section describes the usual Symptoms/Logs witnessed in this environment.
1. Monitor RADIUS Performance
The Dotx Client experiences a large delay with many retries to authenticate.
Use the command show radius auth statistics (GUI: Monitor > Statistics > RADIUS Servers) in order to look for problems. Specifically look for large numbers of Retries, Rejects, and Timeouts. Here is an example:
Server Index..................................... 2
Server Address................................... 192.168.88.1
Msg Round Trip Time.............................. 3 (msec)
First Requests................................... 1256
Retry Requests................................... 5688
Accept Responses................................. 22
Reject Responses................................. 1
Challenge Responses.............................. 96
Malformed Msgs................................... 0
Bad Authenticator Msgs........................... 0
Pending Requests................................. 1
Timeout Requests................................. 6824
Unknowntype Msgs................................. 0
Other Drops...................................... 0
Look for:
- High Retry: First Request ratio (should be no more than 10%)
- High Reject: Accept ratio
- High Timeout: First Request ratio (should be no more than 5%)
If there are problems, check for:
- Misconfigured clients
- Network reachability problems between the WLC and the RADIUS server
- Problems between the RADIUS server and the backend database, if in use, such as with Active Directory (AD)
2. The WLC Sees the RADIUS Queue Full on the Msglogs
The WLC receives this message about the RADIUS queue:
Univ-WISM2-02: *aaa QueueReader: Dec 02 14:25:31.565: #AAA-3-3TXQUEUE_ADD_FAILED:
radius_db.c:889 Transmission queue full. Que name: Radius queue. Dropping
sessionpackets.
host = x.x.x.x.
3. Debug AAA
A debug of AAA shows this message:
*aaaQueueReader: Dec 02 21 09:19:52.198: xx:xx:xx:xx:xx:xx Returning AAA Error
'Out of Memory' (-2) for mobile xx:xx:xx:xx:xx:xx
A debug of AAA returns the AAA Error Timeout (-5) for mobile devices. The AAA Server is unreachable and is followed by client deauthorization.
4. RADIUS Server is Too Busy and Does Not Respond
Here is the Log System Time Trap:
0 Wed Aug 20 15:30:40 2014 RADIUS auth-server x.x.x.x:1812 available
1 Wed Aug 20 15:30:40 2014 RADIUS auth-server x.x.x.x:1812 available
2 Wed Aug 20 15:30:40 2014 RADIUS server x.x.x.x:1812 activated on WLAN 6
3 Wed Aug 20 15:30:40 2014 RADIUS server x.x.x.x:1812 deactivated on WLAN 6
4 Wed Aug 20 15:30:40 2014 RADIUS auth-server x.x.x.x:1812 unavailable
5 Wed Aug 20 15:30:40 2014 RADIUS server x.x.x.x:1812 failed to respond to request
(ID 22) for client 68:96:7b:0e:46:7f / user 'user1@univ1.edu'
6 Wed Aug 20 15:29:57 2014 User Larry_Dull_231730 logged Out. Client MAC:84:a6:c8:
87:13:9c, Client IP:198.21.137.22, AP MAC:c0:7b:bc:cf:af:40, AP Name:Dot1x-AP
7 Wed Aug 20 15:28:42 2014 RADIUS server x.x.x.x:1812 failed to respond to request
(ID 183) for client 48:d7:05:7d:93:a5 / user ' user2@univ2.edu '
8 Wed Aug 20 15:28:42 2014 RADIUS auth-server x.x.x.x:1812 unavailable
9 Wed Aug 20 15:28:42 2014 RADIUS server x.x.x.x:1812 failed to respond to request
(ID 154) for client 40:0e:85:76:00:68 / user ' user1@univ1.edu '
10 Wed Aug 20 15:28:41 2014 RADIUS auth-server x.x.x.x:1812 available
11 Wed Aug 20 15:28:41 2014 RADIUS auth-server x.x.x.x:1812 unavailable
12 Wed Aug 20 15:28:41 2014 RADIUS server x.x.x.x:1812 failed to respond to request
(ID 99) for client 50:2e:5c:ea:e4:ba / user ' user3@univ3.edu '
13 Wed Aug 20 15:28:38 2014 RADIUS auth-server x.x.x.x:1812 available
14 Wed Aug 20 15:28:38 2014 RADIUS auth-server x.x.x.x:1812 unavailable
15 Wed Aug 20 15:28:38 2014 RADIUS server x.x.x.x:1812 failed to respond to request
(ID 30) for client b4:18:d1:60:6b:51 / user ' user1@univ1.edu '
16 Wed Aug 20 15:28:38 2014 RADIUS auth-server x.x.x.x:1812 available
17 Wed Aug 20 15:28:38 2014 RADIUS server x.x.x.x:1812 activated on WLAN 6
18 Wed Aug 20 15:28:38 2014 RADIUS server x.x.x.x:1812 deactivated on WLAN 6
19 Wed Aug 20 15:28:38 2014 RADIUS auth-server x.x.x.x:1812 unavailable
Best Practice Tuning
WLC-Side Tuning
- Extensible Authentication Protocol (EAP) - Make the 802.1X client exclusion work.
- Enable client exclusion globally for 802.1X.
- Set client exclusion on the 802.1X Wireless LANs (WLANs) to at least 120 seconds.
- Set EAP timers as described in the 802.1X Client Exclusion on an AireOS WLC article.
- Set RADIUS retransmission timeouts to at least five seconds.
- Set Session-Timeout to at least eight hours.
- Disable Aggressive Failover, which does not allow a single misbehaving supplicant to cause the WLC to fail between the RADIUS servers.
- Configure Fast Secure Roaming for your clients.
- Make sure that Microsoft Windows EAP clients use Wi-Fi Protected Access 2 (WPA2)/Advanced Encryption Standard (AES) so they can use Opportunistic Key Caching (OKC).
- If you can segregate Apple iOS clients to their own WLAN, then you can enable 802.11r on that WLAN.
- Enable Cisco Centralized Key Management (CCKM) for any WLAN that supports 792x phones (but do not enable CCKM on any Service Set Identifier (SSID) that supports Microsoft Windows or Android clients, because they tend to have problematic CCKM implementations).
- Enable Sticky Key Caching (SKC) for any EAP WLAN that supports the Macintosh Operating System (MAC OS) X and/or Android clients.
Refer to 802.11 WLAN Roaming and Fast-Secure Roaming on CUWN for more information.
Note: Monitor your WLC Pairwise Master Key (PMK) cache usage at peak times with the show pmk-cache all command. If you reach your maximum PMK-cache size, or get close to it, then you will probably have to disable SKC.
- If you use ISE with profiling, then use WLC-side DHCP/HTTP profiling. This wraps the profiling data into a RADIUS Accounting packet that is easily load-balanced, which ensures that all data for the endpoint reaches the same Public Services Network (PSN).
- Make sure that interim accounting is off unless you need it for byte-based billing services. Otherwise interim accounting only adds load with no additional benefit.
- Run the best WLC code.
RADIUS Server-Side Tuning
- Reduce the logging rate. Most RADIUS servers are configurable about what logging they will store. If the ACS or the ISE is used, an administrator can choose what categories are logged to the monitoring database. One example might be if accounting data is sent off of the RADIUS server and viewed with another application such as SYSLOG, then do not write the data to the database locally. On the ISE, ensure that log suppression remains enabled at all times. If it must be disabled for troubleshooting purposes, then go to Administration > System > Logging > Collection Filters and use the Bypass Suppression option in order to disable suppression on an individual endpoint or user. In ISE Version 1.3 and later, an endpoint can be right-clicked in the live authentication log in order to disable suppression as well.
- Ensure backend authentication latency is low (AD, Lightweight Directory Access Protocol (LDAP), Rivest, Shamir, Adleman (RSA)). If you use the ACS or the ISE, the authentication summary reports can be run in order to monitor the latency on a per-server basis for both average and peak latency. The longer it takes a request to be processed, the lower the authentication rate the ACS or the ISE can process. 95% of the time, high latency is due to a slow response from a backend database.
- Disable Protected Extensible Authentication Protocol (PEAP) Password Retries. Most devices do not support password retries inside the PEAP tunnel, so a retry from the EAP server causes the device to stop responding and restart with a new EAP session. This causes EAP timeouts instead of rejects, which means that client exclusions will not be hit.
- Disable Unused EAP Protocols. This is not critical but does add some efficiency to the EAP exchange and ensures that a client cannot use a weak or unintended EAP method.
- Enable PEAP Session Resume and Fast Reconnect.
- Do not send MAC Authentications to the AD if not needed. This is a common misconfiguration that increases the load on the domain controllers that ISE authenticates against. These often lead to negative searches that are time consuming and increase the average latency.
- Use the Device Sensor where applicable (ISE specific).