This chapter describes how to troubleshoot ATM problems that are seen when you transport Layer 2 frames/Layer 3 packets over a WAN backbone. It reviews:
How frames or packets are segmented into ATM cells
What the important show commands are and how to interpret them
How to detect and troubleshoot incorrect shaping or policing
Note: The information in this chapter is applicable to all Cisco devices as it focuses solely on the technology itself, not on hardware or software dependency.
Asynchronous Transfer Mode (ATM) is a technology that was defined by the ITU-T, formerly known as the CCITT, in the early 1990s. The related standards describe a transport technology where information is carried in small fixed-length data units called cells.
In an ATM network, a clear distinction can be made between the devices that support the applications, called End-Systems (ES) and the devices that only relay the cells. These relaying devices are intermediate-systems (IS) or ATM switches. Examples of ESs are routers and LAN Emulation (LANE) modules. Examples of ISs are LS1010, 8540MSR, BPX.
This is a representation of an ATM network:
ATM, amongst other things, defines how to segment and re-assemble different types of information. ATM can transport video, voice, and data. Proper quality of service (QoS) is reserved and guaranteed by the ATM network. Since any type of information can be segmented into cells in accordance to the related standard, ATM is a flexible tool and can therefore be used in many environments. These environments can be classified into two main categories:
LAN Switched environment—LANE is most commonly used. Typically, there is little QoS in this dynamic environment since ATM connections are built and removed on demand.
WAN environment—There are two players:
_Telco—Typically offers very precise quality of service in a static environment. The ATM network of a Telephone Company is made of ATM switches. Since a Telephone Company offers an ATM service, call him an ATM service provider.
_Enterprise—Typically requests an ATM service from the ATM service provider
This chapter focuses solely on ATM connections in an enterprise WAN environment. End-systems in such an environment are routers 99% of the time. You therefore only use the word router in the rest of this document. Those routers exchange packets 1 . You use IP as our reference protocol, and all explanations are valid for other Layer 3 protocols, such as IPX and ATALK. From the enterprise point of view, the network looks similar to this:
There is typically a traffic contract on the quality of service that is respected by the enterprise routers and the ATM service provider. Initially, it looks quite simple with only two devices in the picture and the cloud of the ATM provider that is not visible from the enterprise point of view. Unfortunately, the problems in this environment are not trivial because you do not have full visibility on the equipment of the ATM provider.
There are no specific requirements for this document.
This document is not restricted to specific software and hardware versions.
Refer to Cisco Technical Tips Conventions for more information on document conventions.
AAL (ATM Adaptation Layer) adapts user information, which includes data, voice, video, and so forth, to a format that can be easily divided into ATM cells. Once you have an AAL-PDU, it is passed to the Segmentation and Reassembly (SAR) layer that segments this large packet into ATM cells. AAL5 is the AAL type most commonly used for the transportation of data. Data here also includes Voice over IP. The SAR process for AAL5 is illustrated in this diagram.
At the destination router, the reverse process is applied. Watch for a special bit that is set to 1 in the cell header in order for the destination router to easily identify the last cell of an AAL5 packet.
The whole process, usually implemented in hardware, works efficiently. These are the two main problems that can arise:
One or more cells can be corrupted at the destination by either the transmitter or a device in the ATM network. The only field in the cell that performs a type of cyclic redundancy check (CRC) is the Header Checksum field (HEC). As the name suggests, it only checks the cell header.
One or more cells can be discarded in the network of the provider.
This is how you can examine the impact of those two problems at the destination router and how to detect them:
If one cell is corrupted, the number of cells is still the same. The CPCS-PDU frame re-assembles, with the correct size. The router checks to see if the length field is indeed correct. But, since one cell is corrupted the whole frame is trivially corrupted. Therefore, the CRC field of the AAL5 CPCS-PDU frame is different from the one that was originally sent.
If one cell is missing at the destination, both the size and the CRC are different from those contained in the CPCS-PDU frame.
Whatever the real problem is, an incorrect CRC is detected at the destination. Check the interface statistics in order for the administrator of the routers to detect this. One CRC error causes the input error counter to be incremented by one 2 . The show interface atm command output illustrates this behavior:
Medina#show interface atm 3/0 ATM3/0 is up, line protocol is up Hardware is ENHANCED ATM PA MTU 4470 bytes, sub MTU 4470, BW 149760 Kbit, DLY 80 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation ATM, loopback not set Keepalive not supported Encapsulation(s): AAL5 4096 maximum active VCs, 2 current VCCs VC idle disconnect time: 300 seconds Signalling vc = 1, vpi = 0, vci = 5 UNI Version = 4.0, Link Side = user 0 carrier transitions Last input 00:00:07, output 00:00:07, output hang never Last clearing of "show interface" counters never Input queue: 0/75/0 (size/max/drops); Total output drops: 0 Queueing strategy: Per VC Queueing 5 minute input rate 0 bits/sec, 0 packets/sec 5 minute output rate 0 bits/sec, 0 packets/sec 104 packets input, 2704 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 0 giants, 0 throttles 32 input errors, 32 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort 106 packets output, 2353 bytes, 0 underruns 0 output errors, 0 collisions, 1 interface resets 0 output buffer failures, 0 output buffers swapped out
In the previous output, the input error counter indicates 32 errors (32 input errors). If the router has been configured for multiple PVCs, then to rely only on the interface global counter might not be adequate since the input error counter might show the traffic for multiple PVCs. It is recommended to use the show atm pvc vpi/vci command in this scenario. For example:
Medina#show atm pvc 0/36 ATM3/0.1: VCD: 4, VPI: 0, VCI: 36 VBR-NRT, PeakRate: 2000, Average Rate: 1000, Burst Cells: 32 AAL5-LLC/SNAP, etype:0x0, Flags: 0x20, VCmode: 0x0 OAM frequency: 0 second(s), OAM retry frequency: 1 second(s), OAM retry frequen) OAM up retry count: 3, OAM down retry count: 5 OAM Loopback status: OAM Disabled OAM VC state: Not Managed ILMI VC state: Not Managed InARP frequency: 15 minutes(s) Transmit priority 2 InPkts: 24972, OutPkts: 25032, InBytes: 6778670, OutBytes: 6751812 InPRoc: 24972, OutPRoc: 25219, Broadcasts: 0 InFast: 0, OutFast: 0, InAS: 0, OutAS: 0 InPktDrops: 0, OutPktDrops: 0 CrcErrors: 0, SarTimeOuts: 0, OverSizedSDUs: 0 OAM cells received: 0 F5 InEndloop: 0, F5 InSegloop: 0, F5 InAIS: 0, F5 InRDI: 0 F4 InEndloop: 0, F4 InSegloop: 0, F4 InAIS: 0, F4 InRDI: 0 OAM cells sent: 0 F5 OutEndloop: 0, F5 OutSegloop: 0, F5 OutRDI: 0 F4 OutEndloop: 0, F4 OutSegloop: 0, F4 OutRDI: 0 OAM cell drops: 0 Status: UP
In this output 3 , the CRC error counter indicates the number of CRC errors for the CPCS-PDU frame. Both commands were typed on the same router. Since no CRC errors (CrcErrors) can be seen on the display of statistics for PVC 0/36, assume that the input errors of the show interface command were due to another PVC.
Note: One input error does not always mean one packet loss. The cell discarded by the ATM provider can be the last one of the frame. Therefore, the cell discarded had this special bit set to one. The only way for the destination to find the frame boundaries is to check this bit. As a result, the destination router, at reassembly time, concatenates all cells it receives until a cell with this bit set to 1 is found. If the last cell of a frame is discarded, two CPCS-PDU frames are lost, and this results in only one CRC and length error.
Traffic shaping refers to an action done by the source of the ATM traffic. Policing refers to actions done by the ATM switches, usually on the side of the provider.
Traffic shaping is the action of the adaption of the cell flow to a specific traffic contract. This is illustrated in this diagram.
Policing is the action of checking if the cell flow respects a specific traffic contract. This is illustrated in this diagram:
Note: These diagrams do not imply that traffic shaping and policing refer to a common contract and use a similar algorithm. Misconfigured policing or shaping often leads to cells that are discarded by the policer. Even if shaping and policing are both set to the same values, policing can start to discard cells. This is usually due to a poor shaper or a policer that malfunctions.
This section only provides an introduction to traffic shaping. You can find more details in the Traffic Management specification available on the ATM Forum website.
In ATM, insert equal time intervals between the cells in order for traffic shaping to work. For example, if an OC-3/STM-1 connection is 155Mbit/sec, only ~149Mbit/sec can be used to forward ATM cells 4 . As a result, the maximum rate is 353.208 cells (353.208 * 53 * 8 bits can fit in the OC-3c/STM-1 frames payload in a second). If you request a connection of 74.5 Mbit/second (half the line rate), equal spaces of 2.83 microseconds is inserted between each cell. 2.83 microseconds is the time needed to send one cell at OC3c/STM-1 (1/353.208 second). As you requested half the line rate, you can send one cell, wait an equal amount of time, and then start over again.
The most classic traffic requested is Variable Bit-Rate (VBR) traffic shaping:
VBR traffic shaping is an effective approach for a busy network. The parameters used are Peak Cell Rate (PCR), Sustainable Cell Rate (SCR) and Maximum Burst Size (MBS). Once a traffic contract has been agreed, cell transmission within the VBR parameters is guaranteed by the ATM network. The number of cells allowed to exceed the SCR is set by the MBS and bound by the PCR.
These are the definitions of these parameters:
PCR—Maximum rate at which the source can send cells
SCR—A bound placed on the long term average cell rate
MBS—Maximum number of cells that can be sent above the SCR at the PCR
A common source of problems is the incorrect configuration of the ATM mapping. After you configure the PVC itself, you must tell the router which PVC to use in order to reach a specific destination. There are three ways you can ensure the right mapping:
If you put the PVC on a point-to-point subinterface, the router assumes that there is only one point-to-point PVC configured on the subinterface. Therefore, any IP packet with a destination IP address in the same subnet is forwarded on this VC. This is the simplest way to configure the mapping and is therefore the recommended method.
If you put the PVC in a point-to-multipoint subinterface or in the main interface, you have to create a static mapping. See the Troubleshooting section for a sample configuration.
You can use Inverse ARP in order to create the mapping automatically. See Important Commands for more information.
The two most common symptoms of the assumption that information is lost between the two routers are:
Slow TCP connections due to cells that are discarded in the ATM cloud, which results in IP packets being discarded and in a high number of retransmissions. TCP itself believes this is due to congestion and tries to lower its transmitting window, which results in a very slow TCP connection. This affects all TCP-based protocols such as Telnet or FTP.
Large IP packets tend to fail while small packets cross the ATM network with no problems. This is again due to cells that are discarded.
Concentrate on this second symptom, which helps detect the problem. Assume that, for every 100 cells transmitted by the source router, the provider discards the last one due to policing. This means that, if a ping has a data portion of 100 bytes, 3 ATM cells are needed in order to send it. This is because 3 x 48 bytes are required to contain the ICMP echo request. In practice, this means that the first 33 pings succeed. More precisely, the first 99 cells are seen within contract by the provider, while the 34th one fail since one of its cells are discarded.
If you assume that you keep the same setup and that, instead of small ICMP echos (pings), you use 1500-byte packets, you need 32 cells in order to transmit each large packet (32 x 48 = 1536 bytes, the smallest multiple of 48 above the packet size). If the network discards one cell out of one hundred, approximately one packet out of three or four are discarded. A simple and efficient way to prove that you have a policing issue is to raise the packet size.
In practice, you can generate large pings from the router itself.
Medina#ping Protocol [ip]: Target IP address: 10.2.1.2 Repeat count [5]: 100 Datagram size [100]: 1500 Timeout in seconds [2]: 2 Extended commands [n]: Sweep range of sizes [n]: Type escape sequence to abort. Sending 100, 1500-byte ICMP Echos to 10.2.1.2, timeout is 2 seconds: !!!.!!.!!!.!!.!!!.!!.!!!.!!.!!!.!!.!!!.!!.!!!.!!.!!!.!!.!!!.!!.!!!.!!.!!!.!!.!!! .!!.!!!.!!.!!!.!!.!
Success rate is 72 percent (72/100).
If the real problem is related to policing, to do the same test with larger packets generates a different result:
Medina#ping Protocol [ip]: Target IP address: 10.2.1.2 Repeat count [5]: 100 Datagram size [100]: 3000 Timeout in seconds [2]: 2 Extended commands [n]: Sweep range of sizes [n]: Type escape sequence to abort. Sending 100, 3000-byte ICMP Echos to 10.2.1.2, timeout is 2 seconds: !.!.!..!.!.!..!.!..!.!...!..!.!.!..!.!.!.!.!.!.!..!..!.!...!..!.!.!..!.!.!..!.!. !..!.!..!.!.!.!..!..!
Success rate is 42 percent (42/100).
Contact your ATM provider and check these points if, after you run these tests, you conclude that you suffer from a policing issue:
Is the provider indeed discarding cells? The provider must be able to tell you this.
If so, for what specific reason? The answer is usually policing, but sometimes, its network is simply congested.
If the reason is policing, then what are the traffic parameters? Do they match the settings on the router?
If the router and the provider do use the same traffic parameters then there is a real problem. Either the router is not shaping well or the provider is not policing accurately. Refer to the Bug Toolkit. (registered customers only) No two traffic shaping implementations give exactly the same resulting traffic. Small variations can be accepted. But, the implementation should only generate a negligible amount of traffic loss.
Some traffic analyzers on the market can check the traffic compliance according to a given set of traffic parameters, for example, from GN Nettest and HP. These devices can tell if the traffic from the router is shaped accurately.
Open a case with the Cisco Technical Support if you find that a Cisco router is not shaping accurately and you cannot find any documented bug and/or card limitation.
The previous section focused on a partial packet loss. This section focuses on total connectivity loss.
Table 1: Total Connectivity Loss Between Two ATM-attached Routers
Possible Problem | Solution |
---|---|
The PVC is broken inside the provider cloud. | This is the most common problem. If the provider has a big problem inside its ATM cloud, the signal that comes from the equipment of the provider is still good. As a result, the interface of the router is still up, up. At the same time, any cell that the router sends is accepted by the provider, but never reaches the destination. Usually, calling the provider gives a quick answer. But, as the interface does not go down, the Layer 3 route is not removed by the routing table, and alternative or backup routes cannot be used 5 . The best solution in this environment is to enable OAM management in order to automate the process. Refer to the Cisco WAN Manager Installation and Configuration Guides for more information. Use loopbacks in order to prove that the ATM card is okay. See the solution for the One of the interfaces is down, down table entry for more information. |
One of the interfaces is down, down. |
|
There is a Layer 3 routing problem. |
|
There is a mismatch in the mapping of the Layer 3 address of the peer router. | There is no automatic mapping between a PVC and the Layer 3 address of the router, which is reachable with the use of the PVC). Use the show atm map command in order to check this: Ema#show atm map Map list test: PERMANENT ip 164.48.227.142 maps to VC 140 |
This section explains the differences between the old syntax (show atm vc and atm pvc) and the new syntax, available as from Cisco IOS® Software Release 11.3T (show atm pvc and pvc).
Use the pvc interface configuration command in order to do one or more of these actions, whose full description can be found in the command reference:
Create an ATM PVC on a main interface or subinterface.
Assign a name to an ATM PVC.
Specify ILMI, QSAAL, or SMDS protocols to be used on this PVC.
Enter interface-atm-pvc configuration mode.
Interface configuration
Medina#show running-config interface atm 3/0.1 Building configuration... Current configuration: ! interface ATM3/0.1 multipoint ip address 10.2.1.1 255.255.255.252 no ip directed-broadcast pvc 0/36 protocol ip 10.2.1.1 broadcast protocol ip 10.2.1.2 broadcast vbr-nrt 2000 1000 32 encapsulation aal5snap ! end
Use show atm pvc 0/36 in order to check its status as shown previously or check with the earlier command show atm vc:
Medina#show atm vc VCD / Peak Avg/Min Burst Interface Name VPI VCI Type Encaps SC Kbps Kbps Cells Sts 3/0 1 0 5 PVC SAAL UBR 149760 UP 3/0 2 0 16 PVC ILMI UBR 149760 UP 3/0.1 4 0 36 PVC SNAP VBR 2000 1000 32 UP
You can display the VC statistics once you have located the right VCD number:
Medina#show atm vc 4 ATM3/0.1: VCD: 4, VPI: 0, VCI: 36 VBR-NRT, PeakRate: 2000, Average Rate: 1000, Burst Cells: 32 AAL5-LLC/SNAP, etype:0x0, Flags: 0x20, VCmode: 0x0 OAM frequency: 0 second(s) InARP frequency: 15 minutes(s) Transmit priority 2 InPkts: 24972, OutPkts: 25137, InBytes: 6778670, OutBytes: 6985152 InPRoc: 24972, OutPRoc: 25419, Broadcasts: 0 InFast: 0, OutFast: 0, InAS: 0, OutAS: 0 InPktDrops: 0, OutPktDrops: 0 CrcErrors: 0, SarTimeOuts: 0, OverSizedSDUs: 0 OAM cells received: 0 OAM cells sent: 0 Status: UP
You can compare the new show atm pvc command and the old show atm vc command. It is recommended to use the new command.
The mapping has been configured since this is a point-to-multipoint interface, and can be checked with the show atm map command:
Medina#show atm map Map list ATM3/0.1pvc4 : PERMANENT ip 10.2.1.1 maps to VC 4, VPI 0, VCI 36, ATM3/0.1 , broadcast ip 10.2.1.2 maps to VC 4, VPI 0, VCI 36, ATM3/0.1 , broadcast
The subinterface type is multipoint, and as such, a mapping is required. In the case of a point-to-point subinterface, the protocol line in the PVC config can be skipped since the router assumes that all IP packets with a destination in the same subnet need to be forwarded to the PVC. Inverse ARP can be configured in the PVC config as well, in order to automate the mapping process.
If you run Cisco IOS Software Release 11.3 (non T train) or earlier, the PVC config command is not yet available and the old syntax should then be used. The whole PVC configuration is done in only one line, which limits the configuration possibilities. The full description can be found in the command reference.
Interface configuration
Medina#show run interface atm 3/0.1 Building configuration... Current configuration: ! interface ATM3/0.1 multipoint no ip directed-broadcast map-group MyMap atm pvc 4 0 36 aal5snap 2000 1000 32 end
This is an example of a partial configuration of map-list definition matching the map-group name:
<snip> ! map-list MyMap ip 10.2.1.1 atm-vc 4 broadcast ip 10.2.1.2 atm-vc 4 broadcast <snip>
Use the previous partial configuration in order to check the mapping with the same command as for the new syntax:
Medina#show atm map Map list MyMap : PERMANENT ip 10.2.1.1 maps to VC 4 , broadcast ip 10.2.1.2 maps to VC 4 , broadcast
Again, you will see that the new syntax is easier and clearer.
Before you call Cisco Technical Support, read through this chapter and complete the actions suggested for the problem of your system.
Complete these steps and document the results in order for Cisco Technical Support to better assist you:
Issue a show tech command of both routers. This helps the Cisco Support Engineer (CSE) to understand the router behavior.
Issue a show atm pvc command on both routers and a show atm pvc vpi/vci of the PVC that causes problems. This helps the CSE to understand the problem.
Explain what the point of view of the ATM provider is on the problem and state whether the provider believes the problem is on the router.
Compare the configuration of PVCs on point-to-point and point-to-multipoint subinterfaces.
Configure a router and a switch with shaping and policing that mismatch. Verify, with a ping test, that the traffic sent by the router is indeed policed incorrectly.
Configure OAM management to have the subinterface go down upon PVC failure.
Compare the configuration of a PVC with the old syntax versus the new syntax. What are the main reasons for the move to the new syntax?
Compare checking the PVC status/statistics with the use of the old command show atm vc versus the new command show atm pvc. What enhancements does the new syntax offer?
ATM can essentially segment any type of information into cells. We often talk about packets or frames (Layer 3 or Layer 2 data units). We could use the word "protocol data unit," which would allow us to discuss very generally whatever the layer, in sync with the OSI specification. For the sake of clarity, we will talk about packets.
You see that the CRC error counter of the show interface is equal to the number of input errors. On some end-systems (such as the LANE modules of the Catalyst 5000), only the input error counter increases. Therefore, you should focus on the input errors. As a rule of thumb, if you do not run a recent release, it is recommended to also check the output of show controller since it gives more physical details on the counters of the ATM card itself.
The output of show atm pvc might vary, which depends on the cards functionality and code feature. The example shown uses the PA-A3 with Cisco IOS Software Release code version 12.1.
Sonet/SDH has approximately 3 percent overhead.
This assumes that static routes have been used. If dynamic routing protocols are used over this ATM PVC, the protocol eventually converges. This process might be slow, see the Troubleshooting section of the corresponding routing protocol.
show controller output is specific to each ATM card. Often, valuable information can be deduced from this output, but no generic description can be given.
Revision | Publish Date | Comments |
---|---|---|
1.0 |
11-Oct-2006 |
Initial Release |