This document discusses general and specific causes for slow performance on ATM networks and procedures to help troubleshoot the problem. The focus of this document is on troubleshooting IP performance issues, specifically on ATM networks. Typically, performance is measured with the use of delay and throughput. Performance is often tested with the use of FTP or other TCP/IP applications to transfer a file between two end devices and then measure the time that it takes to transfer the file. When the throughput rate seen with the file transfer does not equal the bandwidth that is available over the ATM circuit, this is perceived as a performance problem. There are many factors such as TCP window settings, MTU, packet loss, and delay that determine the throughput that is seen across an ATM circuit. This document addresses issues that affect the performance over ATM routed permanent virtual circuits (PVCs), switched virtual circuits (SVCs), and LAN Emulation (LANE) implementations. The cause of performance issues are common between routed PVC, SVC, and LANE implementations.
There are no specific requirements for this document.
This document is not restricted to specific software and hardware versions.
For more information on document conventions, refer to the Cisco Technical Tips Conventions.
The first step when you troubleshoot any performance related issue is to select single source and destination devices to test between. Identify the conditions under which the problem occurs and those that it does not. Select test devices to reduce the complexity of the problem. For example, do not test between devices that are ten router hops apart if the problem exists when you go through two routers.
Once the test devices are selected, determine if the performance is related to the inherent nature of TCP applications or if the problem is caused by other factors. Ping between end devices to determine if packet loss occurs and the round trip delay for ping packets. Ping tests should be accomplished with different packet sizes to determine if the size of the packet affects the packet loss. Ping tests should be done from the end devices under test and not from routers. The Round Trip Time (RTT) that you see when you ping to and from a router may not be accurate. This is because the ping process is a low priority process on the router and it may not immediately answer the ping.
A customer has an ATM PVC between New York and Los Angeles. The virtual circuit (VC) is configured with a Sustained Cell Rate (SCR) of 45 Mbps. The customer tests this circuit by transferring a file using FTP from an FTP server to a client and discovers that the throughput for the file transfer is about 7.3 Mbps. When they use TFTP, the throughput drops to 58 Kbps. The ping response time between the client and the server is approximately 70 ms.
The first thing to understand in this example is that TCP provides reliable transport of data between devices. The sender sends data in a stream in which bytes are identified by sequence numbers. The receiver acknowledges that it has received the data by sending the sequence number (acknowledgment number) of the next byte of data that it expects to receive. The receiver also advertises its Window size to the sender to advertise the amount of data that it can accept.
TCP/IP end devices typically include the ability to configure TCP/IP Window sizes.
If devices have their TCP Window sizes set too low, those devices may not be able to utilize the entire bandwidth of an ATM VC.
The RTT on an ATM VC can dramatically reduce the TCP throughput if the Window size is too low.
An end device sends approximately one Window size worth of traffic in bytes per RTT.
For example, if the RTT is 70 ms, use this formula to calculate the necessary Window size to fill up an entire DS3 of bandwidth:
.07s * 45 Mbps * 1byte/8 bits = 393,750 bytes
Standard TCP allows a maximum window size of 64,000 bytes. The WINScale TCP option allows the Window size to be much higher if the devices on both ends support this option and the FTP application also supports this option.
Use this formula to set the Window size at 64,000 bytes and use the RTT of 70 ms to solve the throughput.
.07x * 1byte/8bits = 64000 bytes
where x= 7.31428 Mbps
If the FTP application only supports a Window size of 32,000 bytes, use this formula.
.07x * 1byte/8bits = 32000
where x= 3.657142 Mbps
With TFTP, the sender sends 512 byte packets and must receive an acknowledgment back for each packet before they send the next packet. The best case scenario is to send 1 packet every 70 ms. Use this throughput calculation.
1 packet /.070s = 14.28571 packets/second
512 bytes/packet * 8 bits/byte * 14.28571 packets/second = 58.514 Kbps
This throughput calculation demonstrates that the delay across a link and the TCP Window size can dramatically affect the throughput across that link when it uses TCP/IP applications to measure throughput. Identify the expected throughput for each TCP connection. If FTP is used to test throughput, start up multiple file transfers between different clients and servers to identify if the throughput is limited by the inherent nature of TCP/IP, or if there are other problems with the ATM circuit. If the TCP application limits the throughput, you should be able to have multiple servers that send at the same time and at similar rates.
Next, prove that you can transmit traffic across the link at the SCR rate of the circuit. To do this, use a traffic source and link that does not use TCP and send a stream of data across the ATM VC. Also verify that the received rate is equal to the sent rate. Send extended ping packets from a router with a 0 time-out value to generate traffic across an ATM circuit. This proves that you can send traffic across the link at the configured rate of the circuit.
Solution: Increase the TCP/IP window size.
Important: With a very small RTT and a window size big enough to theoretically fill the SCR, you will never be able to reach the SCR because of ATM overhead. If you consider the example of the 512-byte packets sent across a 4 Mbps (SCR=PCR) AAL5SNAP PVC, calculate the real IP throughput that is measured. It is assumed the TCP window size and the RTT are such that the source can send data at 4 Mbps. First of all, ATM Adaptation Layer 5 (AAL5) and SNAP introduce each 8 bytes of overhead. Because of this, it may be necessary to pad in order to make sure the AAL5 protocol data unit (PDU) can be divided by 48. Then, in each cell, 5 bytes of overhead is introduced per cell. In this case it means that the AAL5 layer is 512+8+8=528 bytes (no padding necessary). These 528 bytes require 11 cells to be transmitted. This means that for each 512-byte packet to send, 583 bytes are sent on the wire (11 * 53). In other words, 71 bytes of overhead are introduced. This means that only 88% of the bandwidth can be used by the IP packets. Therefore, with the 4 Mbps PVC, it means that the usable IP throughput is only about 3.5 Mbps.
The smaller the packet size, the bigger the overhead and the lower the throughput.
The most common reason for performance problems is due to packet loss across ATM circuits. Any cell loss across an ATM circuit results in performance degradation. Packet loss means retransmission and also TCP window size reduction. This results in lower throughput. Usually, a simple ping test identifies if there is loss of packets between the two devices. Cyclic redundancy check (CRC) errors and cell/packet drops on ATM circuits result in the retransmission of data. If ATM cells are discarded by an ATM switch because of policing or buffer exhaustion, CRC errors are seen on the end device when the cells are reassembled into packets. ATM edge devices may drop or delay packets when the outbound packet rate on a VC exceeds the configured traffic shaping rate on the VC.
See these documents for details on troubleshooting the most common causes of packet loss across ATM networks:
Solution: Troubleshoot and eliminate any packet loss.
The amount of time that it takes for a packet to travel from source to destination, and then for an acknowledgment to return to the sender, can dramatically affect the throughput that is seen over that circuit. The delay over an ATM circuit may be the result of normal transmission delay. It takes less time to send a packet from New York to Washington than from New York to Los Angeles when the ATM circuit is the same speed. Other sources for delay are queuing delay through routers and switches and processing delay through Layer 3 routing devices. The processing delay associated with routing devices depends heavily on the platform used and the switching path. The details associated with the routing delay and internal hardware delay is beyond the scope of this document. This delay affects any router regardless of interface types. It is also negligible compared to the delay associated with the transmission of packets and queuing. However, if a router processes switching traffic, it can result in a significant delay and must be taken into consideration.
Delay is typically measured with the use of ping packets between end devices to determine the average and maximum round-trip delay. Delay measurements should be conducted during peak use as well as periods of inactivity. This helps to determine if the delay can be attributed to queuing delay on congested interfaces.
Congestion of interfaces results in a queuing delay. Congestion typically results from bandwidth mismatches. For example, if you have a circuit through an ATM switch that traverses from an OC-12 interface to a DS3 ATM interface, you might experience a queuing delay. This happens as cells arrive on the OC-12 interface faster than they can be output on the DS3 interface. ATM edge routers that are configured for traffic shaping restrict the rate at which they output traffic on the interface. If the arrival rate of traffic that is destined to the ATM VC is greater than the traffic shaping rates on the interface, then the packets/cells are queued on the interface. Typically, the delay introduced through queuing delay is the delay that causes performance issues.
Solution: Implement IP to ATM Class of Service (CoS) features for differentiated service. Utilize features like Class Based Weighted Fair Queuing (CBWFQ) and Low Latency Queuing (LLQ) to reduce or eliminate queuing delay for mission critical traffic. Increase the bandwidth of virtual circuits to eliminate congestion.
ATM PVCs and SVCs have Quality of Service (QoS) parameters associated with each circuit. A traffic contract is established between the ATM edge device and the network. When PVCs are used, this contract is manually configured in the ATM network (ATM switches). With SVCs, ATM signaling is used to establish this contract. ATM edge devices traffic shape data to conform with the specified contract. ATM Network Devices (ATM switches) monitor the traffic on the circuit for conformance with the specified contract and tag (mark) or discard (police) traffic that does not conform.
If an ATM edge device has Peak Cell Rate (PCR)/SCR configured for a rate higher than is provisioned in the network, packet loss is a likely result. The traffic shaping rates configured on the edge device should match what is configured end-to-end through the network. Verify that the configuration matches through all the configured devices. If the edge device sends cells into the network that do not conform to the contract that is provisioned throughout the network, cells discarded within the network are typically seen. This can usually be detected by the receipt of CRC errors on the far end when the receiver attempts to reassemble the packet.
An ATM edge device with PCR/SCR configured for a rate lower than is provisioned in the network causes degraded performance. In this situation, the network is configured to provide more bandwidth than the edge device sends. This condition may result in additional queuing delay and even output queue drops on the edge ATM router's egress interface.
SVCs are configured on the edge devices but the network may not establish the SVC end-to-end with the same traffic parameters. The same concepts and problems apply to SVCs that apply with PVCs. The network may not set up the SVC end-to-end with the same QoS classes and parameters. This type of problem is typically caused with a bug or interoperability issues. When an SVC is signalled, the calling party specifies the QoS traffic shaping parameters in the forward and backward direction. It can happen that the called party does not install the SVC with the proper shaping parameters. The configuration of Strict Traffic Shaping on router interfaces can prevent SVCs from being setup with shaping parameters that are other than those configured.
The user must trace the path of the SVC through the network and verify that it is established with the use of the QoS class and parameters that are configured on the originating device.
Solution: Eliminate traffic shaping/policy configuration mismatches. If SVCs are used, verify that they are setup end-to-end with the correct shaping/policing parameters. Configure Strict Traffic Shaping on ATM router interfaces with the atm sig-traffic-shaping strict command.
SVCs that are configured for Unspecified Bit Rate (UBR) may get setup over non-optimal paths. A UBR VC is limited in bandwidth to the line rate of the links that the VC traverses. Therefore, if a high speed link were to go down, the VCs that traverse that link may get reestablished over a slower link. Even when the high speed link is restored, the VCs are not torn down and reestablished over the faster link. This is because the slower path satisfies the requested (unspecified) QoS parameters. This problem is very common in LANE networks which have alternate paths through the network. In cases in which the alternate paths are the same link speed, the failure of one of the links causes all of the SVCs to be routed over the same path. This situation can dramatically affect the throughput and performance of the network since the effective bandwidth of the network is cut in half.
Even Variable Bit Rate (VBR) and Constant Bit Rate (CBR) SVCs may get routed over non-optimal paths. End devices request specific traffic parameters (PCR, SCR, Maximum Burst Size {MBS}). The goal of Private Network-Network Interface (PNNI) and ATM signaling is to provide a path that meets the QoS requirements of the request. In the case of CBR and VBR-rt calls, this also includes Maximum Cell Transfer Delay. A path may satisfy the requirements specified by the requester from the bandwidth point-of-view, but not be the optimal path. This problem is common when there are paths with longer delay that still meet the bandwidth requirements for VBR and CBR VCs. This may be perceived as a performance issue to the customer who now sees larger delay characteristics across the network.
Solution: SVCs across an ATM network are established on demand and are typically not torn down and rerouted over a different path unless the SVC is torn down (due to inactivity or released for other reasons). Cisco LightStream 1010 and Catalyst 8500 ATM switches provide the Soft PVC Route Optimization Feature. This feature provides the ability to dynamically reroute a Soft PVC when a better route is available. A similar functionality is not available for SVCs that do not terminate on the ATM switches.
One possible solution to this problem is to use PVCs between the ATM edge devices and the connected ATM switches. Soft PVCs with Route Optimization configured between ATM switches provide the ability to reroute the traffic from non-optimal paths after link failure and subsequent recovery.
Configure the Idle Timeout Interval for SVCs to be low so that SVCs are torn down and re-establish more frequently. Use the idle-timeout seconds [minimum-rate] command to change the amount of time and traffic rates that cause the SVC to be torn down. This may not prove very effective since the VC has to be inactive in order to get rerouted over the optimal path.
If all else fails, make sure that the optimal path has been restored to operation and then bounce one of the ATM interfaces associated with the slow speed redundant path or one of the router interfaces that terminates the SVC.
The architecture of the PA-A1 ATM port adapter and lack of onboard memory can result in degraded performance. This problem may manifest itself in abort, overrun, ignores, and CRCs on the interface. The problem is compounded when used with a Cisco 7200 router with NPE-100/175/225/300.
Refer to Troubleshooting Input Errors on PA-A1 ATM Port Adapters for additional information.
Solution: Replace PA-A1 ATM Port Adapters with PA-A3 (at least revision 2) or PA-A6 ATM Port Adapters.
The PA-A3 hardware revision 1 does not reassemble cells into packets that use the onboard static RAM (SRAM) on the port adapter. The adapter forwards the cells across the peripheral component interconnect (PCI) bus to the Versatile Interface Processor (VIP) or Network Processing Engine (NPE) host memory where it reassembles the packets. This results in similar performance related problems as those seen with the PA-A1 ATM port adapter.
Refer to Troubleshooting Input and Output Errors on PA-A3 ATM Port Adapters for additional information.
Solution: Replace PA-A3 hardware revision 1 ATM Port Adapters with PA-A3 (at least revision 2) or PA-A6 ATM Port Adapters.
The PA-A3-OC3SMM, PA-A3-OC3SMI, and PA-A3-OC3SML are designed to provide maximum switching performance when a single port adapter is installed in a single VIP2-50. A single PA-A3-OC3SMM, PA-A3-OC3SMI, or PA-A3-OC3SML in a VIP2-50 provides up to approximately 85,000 packets per second of switching capacity in each direction using 64 byte packets. Note that a single PA-A3-OC3SMM, PA-A3-OC3SMI, or PA-A3-OC3SML alone can use the entire switching capacity of a single VIP2-50.
For applications that require maximum port density or lower system cost, dual port adapter configurations with the OC-3/STM-1 version of the PA-A3 in the same VIP2-50 are now supported. The two port adapters in the same VIP2-50 share approximately 95,000 packets per second of switching capacity in each direction using 64 byte packets.
The VIP-50 provides up to 400 megabits per second (mbps) of aggregate bandwidth depending on the port adapter combinations. In most dual port adapter configurations with the PA-A3-OC3SMM, PA-A3-OC3SMI, or PA-A3-OC3SML, the combination of port adapters exceeds this aggregate bandwidth capacity.
Consequently, the performance shared between the two port adapters installed in the same VIP2-50 are limited by the aggregate switching capacity (95 kpps) at small packet sizes, and by the aggregate bandwidth (400 mbps) at large packet sizes.
These performance caveats must be considered when you designate ATM networks with the PA-A3-OC3SMM, PA-A3-OC3SMI, or PA-A3-OC3SML. Depending on the design, the performance of dual port adapters in the same VIP2-50 may or may not be acceptable.
Refer to PA-A1 and PA-A3 VIP2 Configurations Supported for additional information.
Excessive numbers of end systems in single LANE ELAN can significantly degrade the performance of all the end stations. An ELAN represents a broadcast domain. All workstations and servers within the ELAN receive broadcast, and possibly multicast traffic from all other devices in the ELAN. If the level of broadcast traffic is high relative to the processing capability of the workstation, the performance of the workstations suffers.
Solution: Restrict the number of end stations within a single ELAN to less than 500. Monitor the network for Broadcast/Multicast storms that may adversely affect server/workstation performance.
Refer to LANE Design Recommendations for additional information.
Other problems that can lead to poor performances in a LANE network are excessive LANE ARP (LE-ARP) activity and Spanning Tree Topology changes. These problems lead to unresolved LE-ARPs that lead to traffic sent over the bus. This can also lead to high CPU utilization on the LECs in the network which can also cause performance related problems. More information about these problems can be found at Troubleshooting Spanning-Tree over LANE.
Configure Spanning Tree PortFast on the host ports of LANE attached Ethernet switches to reduce Spanning Tree Topology changes. Configure local LE-ARP reverification on Catalyst 5000 and 6000 switches configured for LANE to reduce LE-ARP traffic.
Using LANE version 1, SVCs are set up as UBR Service Category. LANE version 2 supports the ability to Data Direct SVCs to be established with the use of other service categories like VBR-nrt. One third party vendor has a bug in their LANE client implementation that can cause the Data Direct SVCs that are set up to Cisco devices to be VBR-nrt with an SCR of 4 Kbps. If your ATM backbone consists of OC-3 (155 Mbps) and OC-12 (622 Mbps) trunk links and you set up SVC over those trunks with a Sustained Cell Rate of 4 Kbps, your performance suffers. While this particular problem is not common, it points out an important need when you troubleshoot performance issues over ATM circuits. You must track the path that your SVCs traverse through the network and confirm that the VC has been establish with the desired service category and traffic parameters.
LANE Data Direct VCs are bi-directional point-to-point SVCs that are set up between two LAN Emulation Clients (LECs) and are used to exchange data between those clients. LANE clients send LE-ARP requests to learn the ATM addresses associated with a MAC address. They then attempt to set up a Data Direct VC to that ATM address. Prior to the Data Direct VC establishment, LANE clients flood unknown unicast packets to the Broadcast and Unknown Server (BUS). A LANE client may fail to establish a Data Direct VC to another LEC for the purpose of sending unicast data to it. If this happens, performance degradation may result. The problem is significant if the device chosen to perform the BUS services is underpowered, inadequate, or overloaded. In addition, some platforms may rate limit unicasts that are forwarded to the BUS. The Catalyst 2900XL LANE module is one such box that throttles unicast traffic sent to the BUS while Catalyst 5000 and Catalyst 6000 do not.
The Data Direct SVC may fail to be established or be used for any of these reasons:
The LEC does not receive a response to the LE-ARP request.
The SVC cannot be created because of ATM routing or signaling issues.
LANE Flush Message Protocol failure. Once the Data Direct VC is established, the LEC sends a Flush request on the Muticast Send VC to ensure that all the data frames that have been sent over the BUS have reached their destination. When the LEC that sent the Flush request receives a response back, it begins to send data over the Data Direct VC. The Flush mechanism can be disabled with the command no lane client flush.
UBR VCs on Inverse Multiplexing (IMA) interfaces are set up with a PCR of 1.5 Mbps instead of the sum of all the up/up physical interfaces that are configured in the IMA group. This condition degrades performance since the VC is traffic shaped at a rate lower than the combined bandwidth of all the links in the IMA group.
Originally, the bandwidth of an IMA group interface was limited to the minimum number of active IMA links needed to keep the IMA interface up. The command to define this value is IMA active-links-minimum. For example, if four physical ATM interfaces are configured as members of IMA group zero and the IMA active-links-minimum value is set to one, the bandwidth is equal to one T1 or 1.5 Mbps, not 6 Mbps.
Cisco bug ID CSCdr12395 (registered customers only) changes this behavior. The PA-A3-8T1IMA adapter now uses the bandwidth of all up/up ATM physical interfaces configured as IMA group members.
Cisco bug IDs CSCdt67354 (registered customers only) and CSCdv67523 (registered customers only) are subsequent enhancement requests to update the IMA group VC bandwidth when an interface is added or removed from the IMA group, shut/no shut or bounces due to a link failure, or change at the remote end. The changes implemented in Cisco bug IDCSCdr12395 (registered customers only) configure the IMA group bandwidth to the total bandwidth of its member links only when the IMA group comes up. Changes to the IMA group after the initial up status are not reflected.
Refer to Troubleshooting ATM Links on the 7x00 IMA Port Adapter for additional information.
Revision | Publish Date | Comments |
---|---|---|
1.0 |
04-Aug-2004 |
Initial Release |