- Preface
- Chapter 1 - Overview
- Chapter 2 - Audit Troubleshooting
- Chapter 3 - Billing Troubleshooting
- Chapter 4 - Call Processing Troubleshooting
- Chapter 5 - Configuration Troubleshooting
- Chapter 6 - Database Troubleshooting
- Chapter 7 - Maintenance Troubleshooting
- Chapter 8 - Operations Support System Troubleshooting
- Chapter 9 - Security Troubleshooting
- Chapter 10 - Signaling Troubleshooting
- Chapter 11 - Statistics Troubleshooting
- Chapter 12 - System Troubleshooting
- Chapter 13 - Network Troubleshooting
- Chapter 14 - General Troubleshooting
- Chapter 15 - Diagnostic Tests
- Chapter 16 - Disaster Recovery Procedures
- Chapter 17 - Disk Replacement
- Appendix A - Recoverable and Nonrecoverable Error Codes
- Appendix B - System Usage of MGW Keepalive Parameters
- Appendix C - Overload Control
- Glossary
- Introduction
- System Events and Alarms
- Monitoring System Events
- Test Report—System (1)
- Inter-Process Communication Queue Read Failure—System (2)
- Inter-Process Communication Message Allocate Failure—System (3)
- Inter-Process Communication Message Send Failure—System (4)
- Unexpected Inter-Process Communication Message Received—System (5)
- Index List Insert Error—System (6)
- Index List Remove Error—System (7)
- Thread Creation Failure—System (8)
- Timer Start Failure—System (9)
- Index Update Registration Error—System (10)
- Index Table Add-Entry Error—System (11)
- Software Error—System (12)
- Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13)
- Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14)
- Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15)
- Troubleshooting System Alarms
- Inter-Process Communication Queue Read Failure—System (2)
- Inter-Process Communication Message Allocate Failure—System (3)
- Inter-Process Communication Message Send Failure—System (4)
- Index List Insert Error—System (6)
- Index List Remove Error—System (7)
- Thread Creation Failure—System (8)
- Index Update Registration Error—System (10)
- Index Table Add Entry Error—System (11)
- Software Error—System (12)
- Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13)
- Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14)
- Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15)
System Troubleshooting
Introduction
This chapter provides the information needed for monitoring and troubleshooting system events and alarms. This chapter is divided into the following sections:
•System Events and Alarms—Provides a brief overview of each system event and alarm
•Monitoring System Events—Provides the information needed for monitoring and correcting the system events
•Troubleshooting System Alarms—Provides the information needed for troubleshooting and correcting the system alarms
System Events and Alarms
This section provides a brief overview of all of the system events and alarms for the Cisco BTS 10200 Softswitch; the event and alarms are arranged in numerical order. Table 12-1 lists all of the system events and alarms by severity.
Note Refer to the "Obtaining Documentation and Submitting a Service Request" section on page l for detailed instructions on contacting Cisco TAC and opening a service request.
Note Click the system message number in Table 12-1 to display information about the event or alarm.
|
|
|
|
|
|
---|---|---|---|---|---|
System (1)
Table 12-2 lists the details of the System (1) information event. For additional information, refer to the "Test Report—System (1)" section.
Description |
Test Report |
Severity |
Information |
Threshold |
100 |
Throttle |
0 |
Primary |
This is a test report for the System category. |
Primary |
No action is required. |
System (2)
Table 12-3 lists the details of the System (2) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Inter-Process Communication Queue Read Failure—System (2)" section.
System (3)
Table 12-4 lists the details of the System (3) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Inter-Process Communication Message Allocate Failure—System (3)" section.
System (4)
Table 12-5 lists the details of the System (4) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Inter-Process Communication Message Send Failure—System (4)" section.
System (5)
Table 12-6 lists the details of the System (5) warning event. To monitor and correct the cause of the event, refer to the "Unexpected Inter-Process Communication Message Received—System (5)" section.
System (6)
Table 12-7 lists the details of the System (6) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Index List Insert Error—System (6)" section.
System (7)
Table 12-8 lists the details of the System (7) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Index List Remove Error—System (7)" section.
System (8)
Table 12-9 lists the details of the System (8) major alarm. To troubleshoot and correct the cause of the alarm, refer to the "Thread Creation Failure—System (8)" section.
System (9)
Table 12-10 lists the details of the System (9) warning event. To monitor and correct the cause of the event, refer to the "Timer Start Failure—System (9)" section.
System (10)
Table 12-11 lists the details of the System (10) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Index Update Registration Error—System (10)" section.
System (11)
Table 12-12 lists the details of the System (11) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Index Table Add Entry Error—System (11)" section.
System (12)
Table 12-13 lists the details of the System (12) major alarm. To troubleshoot and correct the cause of the alarm, refer to the "Software Error—System (12)" section.
System (13)
Table 12-14 lists the details of the System (13) critical alarm. To troubleshoot and correct the cause of the alarm, refer to the "Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13)" section.
System (14)
Table 12-15 lists the details of the System (14) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14)" section.
System (15)
Table 12-16 lists the details of the System (15) major alarm. To troubleshoot and correct the cause of the alarm, refer to the "Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15)" section.
Monitoring System Events
This section provides the information you need for monitoring and correcting system events. Table 12-17 lists all of the system events in numerical order and provides cross-references to each subsection.
Note Refer to the "Obtaining Documentation and Submitting a Service Request" section on page l for detailed instructions on contacting Cisco TAC and opening a service request.
|
|
|
---|---|---|
System (1) |
Information |
|
System (2) |
Minor |
|
System (3) |
Inter-Process Communication Message Allocate Failure—System (3) |
Minor |
System (4) |
Minor |
|
System (5) |
Unexpected Inter-Process Communication Message Received—System (5) |
Warning |
System (6) |
Minor |
|
System (7) |
Minor |
|
System (8) |
Major |
|
System (9) |
Warning |
|
System (10) |
Minor |
|
System (11) |
Minor |
|
System (12) |
Major |
|
System (13) |
Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13) |
Critical |
System (14) |
Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14) |
Minor |
System (15) |
Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15) |
Major |
Test Report—System (1)
The Test Report event is for testing the system event category. The event is informational and no further action is required.
Inter-Process Communication Queue Read Failure—System (2)
The Inter-Process Communication Queue Read Failure alarm (minor) indicates that the IPC queue read has failed. To troubleshoot and correct the cause of the Inter-Process Communication Queue Read Failure alarm, refer to the "Inter-Process Communication Queue Read Failure—System (2)" section.
Inter-Process Communication Message Allocate Failure—System (3)
The Inter-Process Communication Message Allocate Failure alarm (minor) indicates that the IPC message allocation has failed. To troubleshoot and correct the cause of the Inter-Process Communication Message Allocate Failure alarm, refer to the "Inter-Process Communication Message Allocate Failure—System (3)" section.
Inter-Process Communication Message Send Failure—System (4)
The Inter-Process Communication Message Send Failure alarm (minor) indicates that the IPC message send has failed. To troubleshoot and correct the cause of the Inter-Process Communication Message Send Failure alarm, refer to the "Inter-Process Communication Message Send Failure—System (4)" section.
Unexpected Inter-Process Communication Message Received—System (5)
The Unexpected Inter-Process Communication Message Received event serves as a warning that an unexpected IPC message was received. The primary cause of the event is that the IPC process is receiving messages it is not expecting. To correct the primary cause of the event, contact Cisco TAC.
Index List Insert Error—System (6)
The Index List Insert Error alarm (minor) indicates that an error has been inserted in the index list. To troubleshoot and correct the cause of the Index List Insert Error alarm, refer to the "Index List Insert Error—System (6)" section.
Index List Remove Error—System (7)
The Index List Remove Error alarm (minor) indicates that an index list remove error has occurred. To troubleshoot and correct the cause of the Index List Remove Error alarm, refer to the "Index List Remove Error—System (7)" section.
Thread Creation Failure—System (8)
The Thread Creation Failure alarm (major) indicates that a thread creation has failed. To troubleshoot and correct the cause of the Thread Creation Failure alarm, refer to the "Thread Creation Failure—System (8)" section.
Timer Start Failure—System (9)
The Timer Start Failure event serves as a warning that a timer start failure has occurred. The primary cause of the event is that the process was unable to start a platform timer. To correct the primary cause of the event, check and see if the problem persists. If the problem persists, call Cisco TAC.
Index Update Registration Error—System (10)
The Index Update Registration Error alarm (minor) indicates that an index update registration error has occurred. To troubleshoot and correct the cause of the Index Update Registration Error alarm, refer to the "Index Update Registration Error—System (10)" section.
Index Table Add-Entry Error—System (11)
The Index Table Add-entry Error alarm (minor) indicates that an error occurred during the addition of an entry in the index table. To troubleshoot and correct the cause of the Index Table Add-entry Error alarm, refer to the "Index Table Add Entry Error—System (11)" section.
Software Error—System (12)
The Software Error alarm (major) indicates that a software error has occurred. To troubleshoot and correct the cause of the Software Error alarm, refer to the "Software Error—System (12)" section.
Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13)
The Multiple Readers and Multiple Writers Maximum Q Depth Reached alarm (critical) indicates that the multiple readers and multiple writers (MRMW) maximum queue depth has been reached. To troubleshoot and correct the cause of the Multiple Readers and Multiple Writers Maximum Q Depth Reached alarm, refer to the "Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13)" section.
Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14)
The Multiple Readers and Multiple Writers Queue Reached Low Queue Depth alarm (minor) indicates that the MRMW queue has reached the low queue depth threshold. To troubleshoot and correct the cause of the Multiple Readers and Multiple Writers Queue Reached Low Queue Depth alarm, refer to the "Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14)" section.
Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15)
The Multiple Readers and Multiple Writers Throttle Queue Depth Reached alarm (major) indicates that the MRMW queue has reached throttle depth. To troubleshoot and correct the cause of the Multiple Readers and Multiple Writers Throttle Queue Depth Reached alarm, refer to the "Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15)" section.
Troubleshooting System Alarms
This section provides the information you need for monitoring and correcting system alarms. Table 12-18 lists all of the system alarms in numerical order and provides cross-references to each subsection.
Note Refer to the "Obtaining Documentation and Submitting a Service Request" section on page l for detailed instructions on contacting Cisco TAC and opening a service request.
|
|
|
---|---|---|
System (2) |
Minor |
|
System (3) |
Inter-Process Communication Message Allocate Failure—System (3) |
Minor |
System (4) |
Minor |
|
System (6) |
Minor |
|
System (7) |
Minor |
|
System (8) |
Major |
|
System (10) |
Minor |
|
System (11) |
Minor |
|
System (12) |
Major |
|
System (13) |
Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13) |
Critical |
System (14) |
Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14) |
Minor |
System (15) |
Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15) |
Major |
Inter-Process Communication Queue Read Failure—System (2)
The Inter-Process Communication Queue Read Failure alarm (minor) indicates that the IPC queue read has failed. The primary cause of the alarm is that there is a problem with IPC communication. To correct the primary cause of the alarm, contact Cisco TAC.
Inter-Process Communication Message Allocate Failure—System (3)
The Inter-Process Communication Message Allocate Failure alarm (minor) indicates that the IPC message allocation has failed. The primary cause of the alarm is that there is a system error, or there is not enough free memory left to allocate a message buffer. This alarm indicates a failure of IPC message allocation. It may be caused by following reasons:
•The message size is too big.
•No free entry in the message pool.
•Any internal errors.
To correct the primary causes of the alarm, contact Cisco TAC.
Prior to contacting Cisco TAC, collect statistics for the message pool and message queue.
To collect the statistics, use the pdm.CAxxx script in the /opt/OptiCall/CAxxx/bin directory.
Example:
pdm.CA146 -> 1.IPC Controls -> 2.Message Pool Stats & 6.MEssage Queue Stats
Also, use the top script to collect the current CPU usage.
Inter-Process Communication Message Send Failure—System (4)
The Inter-Process Communication Message Send Failure alarm (minor) indicates that the IPC message send has failed. The primary cause of the alarm is that the process for which the message is intended is not running. To correct the primary cause of the alarm, check to ensure that all components and processes are running. Attempt to restart any component or process that is not running. The secondary cause of the alarm is that an internal error has occurred. To correct the secondary cause of the alarm, contact Cisco TAC.
Index List Insert Error—System (6)
The Index List Insert Error alarm (minor) indicates that an error has been inserted in the index list. The primary cause of the alarm is that an internal error has occurred. To correct the primary cause of the alarm, contact Cisco TAC.
Index List Remove Error—System (7)
The Index List Remove Error alarm (minor) indicates that an index list remove error has occurred. The primary cause of the alarm is that an internal error has occurred. To correct the primary cause of the alarm, contact Cisco TAC.
Thread Creation Failure—System (8)
The Thread Creation Failure alarm (major) indicates that a thread creation has failed. The primary cause of the alarm is that an internal error occurred. A process was unable to create one of its threads. To correct the primary cause of the alarm, attempt to restart the node on which the error occurred. If the same alarm occurs, contact Cisco TAC.
Index Update Registration Error—System (10)
The Index Update Registration Error alarm (minor) indicates that an index update registration error has occurred. The primary cause of the alarm is that an application unsuccessfully requested to be notified of table changes. To correct the primary cause of the alarm, contact Cisco TAC.
Index Table Add Entry Error—System (11)
The Index Table Add Entry Error alarm (minor) indicates that an error occurred during the addition of an entry in the index table. The primary cause of the alarm is that an internal error has occurred. To correct the primary cause of the alarm, contact Cisco TAC.
Software Error—System (12)
The Software Error alarm (major) indicates that a software error has occurred. The primary cause of the alarm is that a logic path is not handled by any algorithm in the code. To correct the primary cause of the alarm, save the trace log from around the time of occurrence and contact Cisco TAC.
Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13)
The Multiple Readers and Multiple Writers Maximum Q Depth Reached alarm (critical) indicates that the MRMW maximum queue depth has been reached. The primary cause of the alarm is message flooding from an erratic network element. To correct the primary cause of the alarm, check the messages to process, The secondary cause of the alarm is resource congestion or slow processing of messages from queue. To correct the secondary cause of the alarm, check the process and system resources. The system may need to be failed over.
Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14)
The Multiple Readers and Multiple Writers Queue Reached Low Queue Depth alarm (minor) indicates that the MRMW queue has reached the low queue depth threshold. The primary cause of the alarm is a high rate of messages from the network. To correct the primary cause of the alarm, check the messages to the system. The secondary cause of the alarm is system or processing thread congestion. To correct the secondary cause of the alarm, check process and system resources.
Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15)
The Multiple Readers and Multiple Writers Throttle Queue Depth Reached alarm (major) indicates that the MRMW queue has reached the throttle depth. The primary cause of the alarm is that inbound network messages arriving at a rate much higher than processing capacity. To correct the primary cause of the alarm, determine the cause of increase in inbound network traffic, and try to control the traffic externally. The secondary cause of the alarm is that there is resource congestion resulting in a slowdown in processing messages from the queue. To correct the secondary cause of the alarm, check the platform CPU utilization, IPC queue depths, and overall availability of system resources.