System Troubleshooting


Revised: July 2010, OL-23033-01

Introduction

This chapter provides the information needed for monitoring and troubleshooting system events and alarms. This chapter is divided into the following sections:

System Events and Alarms—Provides a brief overview of each system event and alarm

Monitoring System Events—Provides the information needed for monitoring and correcting the system events

Troubleshooting System Alarms—Provides the information needed for troubleshooting and correcting the system alarms

System Events and Alarms

This section provides a brief overview of all of the system events and alarms for the Cisco BTS 10200 Softswitch; the event and alarms are arranged in numerical order. Table 12-1 lists all of the system events and alarms by severity.


Note Refer to the "Obtaining Documentation and Submitting a Service Request" section on page l for detailed instructions on contacting Cisco TAC and opening a service request.



Note Click the system message number in Table 12-1 to display information about the event or alarm.


Table 12-1 System Events and Alarms by Severity 

Critical
Major
Minor
Warning
Information
Not Used

System (13)

System (8)

System (2)

System (5)

System (1)

 
 

System (12)

System (3)

System (9)

   
 

System (15)

System (4)

     
   

System (6)

     
   

System (7)

     
   

System (10)

     
   

System (11)

     
   

System (14)

     

System (1)

Table 12-2 lists the details of the System (1) information event. For additional information, refer to the "Test Report—System (1)" section.

Table 12-2 System (1) Details 

Description

Test Report

Severity

Information

Threshold

100

Throttle

0

Primary
Cause

This is a test report for the System category.

Primary
Action

No action is required.


System (2)

Table 12-3 lists the details of the System (2) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Inter-Process Communication Queue Read Failure—System (2)" section.

Table 12-3 System (2) Details 

Description

Inter-Process Communication Queue Read Failure (IPC Queue Read Failure)

Severity

Minor

Threshold

100

Throttle

0

Datawords

Queue Name—STRING [20]
Location Tag—STRING [30]

Primary
Cause

There is a problem with the inter-process communication (IPC) process.

Primary
Action

If the problem persists, contact Cisco TAC.


System (3)

Table 12-4 lists the details of the System (3) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Inter-Process Communication Message Allocate Failure—System (3)" section.

Table 12-4 System (3) Details 

Description

Inter-Process Communication Message Allocate Failure (IPC Message Allocate Failure)

Severity

Minor

Threshold

100

Throttle

0

Datawords

Requested Size—TWO_BYTES
Error Code—FOUR_BYTES
Location Tag—STRING [30]

Primary
Cause

There is a system error or there is not enough free memory left to allocate a message buffer.

Primary
Action

If the problem persists, contact Cisco TAC.


System (4)

Table 12-5 lists the details of the System (4) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Inter-Process Communication Message Send Failure—System (4)" section.

Table 12-5 System (4) Details 

Description

Inter-Process Communication Message Send Failure (IPC Message Send Failure)

Severity

Minor

Threshold

50

Throttle

0

Datawords

Error Code—FOUR_BYTES
Destination Process—FOUR_BYTES
Message Number—FOUR_BYTES
Location Tag—STRING [30]

Primary
Cause

The process for which the message is intended is not running.

Primary
Action

Check to ensure that all components or processes are running. Attempt to restart any component or process that is not running.

Secondary
Cause

An internal error has occurred.

Secondary
Action

If the problem persists, contact Cisco TAC.


System (5)

Table 12-6 lists the details of the System (5) warning event. To monitor and correct the cause of the event, refer to the "Unexpected Inter-Process Communication Message Received—System (5)" section.

Table 12-6 System (5) Details 

Description

Unexpected Inter-Process Communication Message Received (Unexpected IPC Message Received)

Severity

Warning

Threshold

100

Throttle

0

Datawords

Source Process Type—ONE_BYTE
Source Thread Type—ONE_BYTE
Message Number—TWO_BYTES
Location Tag—STRING [30]

Primary
Cause

The process reporting the event is receiving messages it is not expecting.

Primary
Action

Contact Cisco TAC.


System (6)

Table 12-7 lists the details of the System (6) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Index List Insert Error—System (6)" section.

Table 12-7 System (6) Details 

Description

Index List Insert Error (IDX List Insert Error)

Severity

Minor

Threshold

100

Throttle

0

Datawords

List Name—STRING [20]
Index of Entry Being—FOUR_BYTES
Location Tag—STRING [30]

Primary
Cause

An internal error has occurred.

Primary
Action

If the problem persists, contact Cisco TAC.


System (7)

Table 12-8 lists the details of the System (7) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Index List Remove Error—System (7)" section.

Table 12-8 System (7) Details 

Description

Index List Remove Error (IDX List Remove Error)

Severity

Minor

Threshold

100

Throttle

0

Datawords

List Name—STRING [20]
Index of Entry Being—FOUR_BYTES
Location Tag—STRING [30]

Primary
Cause

An internal error has occurred.

Primary
Action

If the problem persists, contact Cisco TAC.


System (8)

Table 12-9 lists the details of the System (8) major alarm. To troubleshoot and correct the cause of the alarm, refer to the "Thread Creation Failure—System (8)" section.

Table 12-9 System (8) Details 

Description

Thread Creation Failure

Severity

Major

Threshold

100

Throttle

0

Datawords

Error Code—FOUR_BYTES
Thread Name—STRING [20]
Location Tag—STRING [30]

Primary
Cause

An internal error has occurred. A process was unable to create one of its threads.

Primary
Action

Attempt to restart the node on which the error occurred. If the same error occurs, contact Cisco TAC.


System (9)

Table 12-10 lists the details of the System (9) warning event. To monitor and correct the cause of the event, refer to the "Timer Start Failure—System (9)" section.

Table 12-10 System (9) Details 

Description

Timer Start Failure

Severity

Warning

Threshold

100

Throttle

0

Datawords

Timer Type—STRING [20]
Location Tag—STRING [30]

Primary
Cause

Process was unable to start a platform timer.

Primary
Action

If the problem persists, contact Cisco TAC.


System (10)

Table 12-11 lists the details of the System (10) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Index Update Registration Error—System (10)" section.

Table 12-11 System (10) Details 

Description

Index Update Registration Error (IDX Update Registration Error)

Severity

Minor

Threshold

100

Throttle

0

Datawords

Error Code—FOUR_BYTES
Table Name—STRING [20]
Location Tag—STRING [30]

Primary
Cause

Application unsuccessfully requested to be notified of table changes.

Primary
Action

Contact Cisco TAC.


System (11)

Table 12-12 lists the details of the System (11) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Index Table Add Entry Error—System (11)" section.

Table 12-12 System (11) Details 

Description

Index Table Add Entry Error (IDX Table Add Entry Error)

Severity

Minor

Threshold

100

Throttle

0

Datawords

Table Name—STRING [20]
Index of Entry Being—FOUR_BYTES
Error Code—FOUR_BYTES
Location Tag—STRING [30]

Primary
Cause

An internal error has occurred.

Primary
Action

If the problem persists, contact Cisco TAC.


System (12)

Table 12-13 lists the details of the System (12) major alarm. To troubleshoot and correct the cause of the alarm, refer to the "Software Error—System (12)" section.

Table 12-13 System (12) Details 

Description

Software Error

Severity

Major

Threshold

100

Throttle

0

Datawords

Context Description—STRING [80]
FileName—STRING [20]
Line Number of Code—TWO_BYTES
Error Specific Information—STRING [80]

Primary
Cause

The logic path is not handled by an algorithm in the code.

Primary
Action

Save a trace log from around the time of the occurrence and contact Cisco TAC.


System (13)

Table 12-14 lists the details of the System (13) critical alarm. To troubleshoot and correct the cause of the alarm, refer to the "Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13)" section.

Table 12-14 System (13) Details 

Description

Multiple Readers and Multiple Writers Maximum Q Depth Reached (MRMW Max Q Depth Reached)

Severity

Critical

Threshold

100

Throttle

0

Datawords

High Mark for Queue Depth—FOUR_BYTES
Low Mark for Queue Depth—FOUR_BYTES

Primary
Cause

Messages are flooding from a malfunctioning network element.

Primary
Action

Check the messages to process.

Secondary
Cause

Resource congestion or slow processing of messages from queue has occurred.

Secondary
Action

Check the process and the system resources. You might need to fail over.


System (14)

Table 12-15 lists the details of the System (14) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14)" section.

Table 12-15 System (14) Details 

Description

Multiple Readers and Multiple Writers Queue Reached Low Queue Depth (MRMW Queue Reached Low Queue Depth)

Severity

Minor

Threshold

100

Throttle

0

Datawords

Lower Queue Depth Limit—FOUR_BYTES
Higher Queue Depth Limit—FOUR_BYTES

Primary
Cause

Messages are being received from the network at a high rate.

Primary
Action

Check the messages to the system.

Secondary
Cause

System or processing thread congestion has occurred.

Secondary
Action

Check the process and the system resources.


System (15)

Table 12-16 lists the details of the System (15) major alarm. To troubleshoot and correct the cause of the alarm, refer to the "Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15)" section.

Table 12-16 System (15) Details 

Description

Multiple Readers and Multiple Writers Throttle Queue Depth Reached (MRMW Throttle Queue Depth Reached)

Severity

Major

Threshold

100

Throttle

0

Datawords

Throttle Mark for Queue Depth—FOUR_BYTES
Throttle Clear Mark for Queue De—FOUR_BYTES

Primary
Cause

Inbound network messages are arriving at a rate much higher than the processing capacity.

Primary
Action

Determine the cause of increase in inbound network traffic and try to control the traffic externally.

Secondary
Cause

Resource congestion resulting in a slowdown in processing messages from queue has occurred.

Secondary
Action

Check the platform CPU utilization, the IPC queue depth, and the overall availability of system resources.


Monitoring System Events

This section provides the information you need for monitoring and correcting system events. Table 12-17 lists all of the system events in numerical order and provides cross-references to each subsection.


Note Refer to the "Obtaining Documentation and Submitting a Service Request" section on page l for detailed instructions on contacting Cisco TAC and opening a service request.


Table 12-17 Cisco BTS 10200 System Events 

Event Type
Event Name
Event Severity

System (1)

Test Report—System (1)

Information

System (2)

Inter-Process Communication Queue Read Failure—System (2)

Minor

System (3)

Inter-Process Communication Message Allocate Failure—System (3)

Minor

System (4)

Inter-Process Communication Message Send Failure—System (4)

Minor

System (5)

Unexpected Inter-Process Communication Message Received—System (5)

Warning

System (6)

Index List Insert Error—System (6)

Minor

System (7)

Index List Remove Error—System (7)

Minor

System (8)

Thread Creation Failure—System (8)

Major

System (9)

Timer Start Failure—System (9)

Warning

System (10)

Index Update Registration Error—System (10)

Minor

System (11)

Index Table Add-Entry Error—System (11)

Minor

System (12)

Software Error—System (12)

Major

System (13)

Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13)

Critical

System (14)

Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14)

Minor

System (15)

Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15)

Major


Test Report—System (1)

The Test Report event is for testing the system event category. The event is informational and no further action is required.

Inter-Process Communication Queue Read Failure—System (2)

The Inter-Process Communication Queue Read Failure alarm (minor) indicates that the IPC queue read has failed. To troubleshoot and correct the cause of the Inter-Process Communication Queue Read Failure alarm, refer to the "Inter-Process Communication Queue Read Failure—System (2)" section.

Inter-Process Communication Message Allocate Failure—System (3)

The Inter-Process Communication Message Allocate Failure alarm (minor) indicates that the IPC message allocation has failed. To troubleshoot and correct the cause of the Inter-Process Communication Message Allocate Failure alarm, refer to the "Inter-Process Communication Message Allocate Failure—System (3)" section.

Inter-Process Communication Message Send Failure—System (4)

The Inter-Process Communication Message Send Failure alarm (minor) indicates that the IPC message send has failed. To troubleshoot and correct the cause of the Inter-Process Communication Message Send Failure alarm, refer to the "Inter-Process Communication Message Send Failure—System (4)" section.

Unexpected Inter-Process Communication Message Received—System (5)

The Unexpected Inter-Process Communication Message Received event serves as a warning that an unexpected IPC message was received. The primary cause of the event is that the IPC process is receiving messages it is not expecting. To correct the primary cause of the event, contact Cisco TAC.

Index List Insert Error—System (6)

The Index List Insert Error alarm (minor) indicates that an error has been inserted in the index list. To troubleshoot and correct the cause of the Index List Insert Error alarm, refer to the "Index List Insert Error—System (6)" section.

Index List Remove Error—System (7)

The Index List Remove Error alarm (minor) indicates that an index list remove error has occurred. To troubleshoot and correct the cause of the Index List Remove Error alarm, refer to the "Index List Remove Error—System (7)" section.

Thread Creation Failure—System (8)

The Thread Creation Failure alarm (major) indicates that a thread creation has failed. To troubleshoot and correct the cause of the Thread Creation Failure alarm, refer to the "Thread Creation Failure—System (8)" section.

Timer Start Failure—System (9)

The Timer Start Failure event serves as a warning that a timer start failure has occurred. The primary cause of the event is that the process was unable to start a platform timer. To correct the primary cause of the event, check and see if the problem persists. If the problem persists, call Cisco TAC.

Index Update Registration Error—System (10)

The Index Update Registration Error alarm (minor) indicates that an index update registration error has occurred. To troubleshoot and correct the cause of the Index Update Registration Error alarm, refer to the "Index Update Registration Error—System (10)" section.

Index Table Add-Entry Error—System (11)

The Index Table Add-entry Error alarm (minor) indicates that an error occurred during the addition of an entry in the index table. To troubleshoot and correct the cause of the Index Table Add-entry Error alarm, refer to the "Index Table Add Entry Error—System (11)" section.

Software Error—System (12)

The Software Error alarm (major) indicates that a software error has occurred. To troubleshoot and correct the cause of the Software Error alarm, refer to the "Software Error—System (12)" section.

Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13)

The Multiple Readers and Multiple Writers Maximum Q Depth Reached alarm (critical) indicates that the multiple readers and multiple writers (MRMW) maximum queue depth has been reached. To troubleshoot and correct the cause of the Multiple Readers and Multiple Writers Maximum Q Depth Reached alarm, refer to the "Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13)" section.

Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14)

The Multiple Readers and Multiple Writers Queue Reached Low Queue Depth alarm (minor) indicates that the MRMW queue has reached the low queue depth threshold. To troubleshoot and correct the cause of the Multiple Readers and Multiple Writers Queue Reached Low Queue Depth alarm, refer to the "Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14)" section.

Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15)

The Multiple Readers and Multiple Writers Throttle Queue Depth Reached alarm (major) indicates that the MRMW queue has reached throttle depth. To troubleshoot and correct the cause of the Multiple Readers and Multiple Writers Throttle Queue Depth Reached alarm, refer to the "Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15)" section.

Troubleshooting System Alarms

This section provides the information you need for monitoring and correcting system alarms. Table 12-18 lists all of the system alarms in numerical order and provides cross-references to each subsection.


Note Refer to the "Obtaining Documentation and Submitting a Service Request" section on page l for detailed instructions on contacting Cisco TAC and opening a service request.


Table 12-18 Cisco BTS 10200 System Alarms 

Alarm Type
Alarm Name
Alarm Severity

System (2)

Inter-Process Communication Queue Read Failure—System (2)

Minor

System (3)

Inter-Process Communication Message Allocate Failure—System (3)

Minor

System (4)

Inter-Process Communication Message Send Failure—System (4)

Minor

System (6)

Index List Insert Error—System (6)

Minor

System (7)

Index List Remove Error—System (7)

Minor

System (8)

Thread Creation Failure—System (8)

Major

System (10)

Index Update Registration Error—System (10)

Minor

System (11)

Index Table Add Entry Error—System (11)

Minor

System (12)

Software Error—System (12)

Major

System (13)

Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13)

Critical

System (14)

Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14)

Minor

System (15)

Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15)

Major


Inter-Process Communication Queue Read Failure—System (2)

The Inter-Process Communication Queue Read Failure alarm (minor) indicates that the IPC queue read has failed. The primary cause of the alarm is that there is a problem with IPC communication. To correct the primary cause of the alarm, contact Cisco TAC.

Inter-Process Communication Message Allocate Failure—System (3)

The Inter-Process Communication Message Allocate Failure alarm (minor) indicates that the IPC message allocation has failed. The primary cause of the alarm is that there is a system error, or there is not enough free memory left to allocate a message buffer. This alarm indicates a failure of IPC message allocation. It may be caused by following reasons:

The message size is too big.

No free entry in the message pool.

Any internal errors.

To correct the primary causes of the alarm, contact Cisco TAC.

Prior to contacting Cisco TAC, collect statistics for the message pool and message queue.

To collect the statistics, use the pdm.CAxxx script in the /opt/OptiCall/CAxxx/bin directory.

Example:

pdm.CA146 -> 1.IPC Controls -> 2.Message Pool Stats & 6.MEssage Queue Stats

Also, use the top script to collect the current CPU usage.

Inter-Process Communication Message Send Failure—System (4)

The Inter-Process Communication Message Send Failure alarm (minor) indicates that the IPC message send has failed. The primary cause of the alarm is that the process for which the message is intended is not running. To correct the primary cause of the alarm, check to ensure that all components and processes are running. Attempt to restart any component or process that is not running. The secondary cause of the alarm is that an internal error has occurred. To correct the secondary cause of the alarm, contact Cisco TAC.

Index List Insert Error—System (6)

The Index List Insert Error alarm (minor) indicates that an error has been inserted in the index list. The primary cause of the alarm is that an internal error has occurred. To correct the primary cause of the alarm, contact Cisco TAC.

Index List Remove Error—System (7)

The Index List Remove Error alarm (minor) indicates that an index list remove error has occurred. The primary cause of the alarm is that an internal error has occurred. To correct the primary cause of the alarm, contact Cisco TAC.

Thread Creation Failure—System (8)

The Thread Creation Failure alarm (major) indicates that a thread creation has failed. The primary cause of the alarm is that an internal error occurred. A process was unable to create one of its threads. To correct the primary cause of the alarm, attempt to restart the node on which the error occurred. If the same alarm occurs, contact Cisco TAC.

Index Update Registration Error—System (10)

The Index Update Registration Error alarm (minor) indicates that an index update registration error has occurred. The primary cause of the alarm is that an application unsuccessfully requested to be notified of table changes. To correct the primary cause of the alarm, contact Cisco TAC.

Index Table Add Entry Error—System (11)

The Index Table Add Entry Error alarm (minor) indicates that an error occurred during the addition of an entry in the index table. The primary cause of the alarm is that an internal error has occurred. To correct the primary cause of the alarm, contact Cisco TAC.

Software Error—System (12)

The Software Error alarm (major) indicates that a software error has occurred. The primary cause of the alarm is that a logic path is not handled by any algorithm in the code. To correct the primary cause of the alarm, save the trace log from around the time of occurrence and contact Cisco TAC.

Multiple Readers and Multiple Writers Maximum Q Depth Reached—System (13)

The Multiple Readers and Multiple Writers Maximum Q Depth Reached alarm (critical) indicates that the MRMW maximum queue depth has been reached. The primary cause of the alarm is message flooding from an erratic network element. To correct the primary cause of the alarm, check the messages to process, The secondary cause of the alarm is resource congestion or slow processing of messages from queue. To correct the secondary cause of the alarm, check the process and system resources. The system may need to be failed over.

Multiple Readers and Multiple Writers Queue Reached Low Queue Depth—System (14)

The Multiple Readers and Multiple Writers Queue Reached Low Queue Depth alarm (minor) indicates that the MRMW queue has reached the low queue depth threshold. The primary cause of the alarm is a high rate of messages from the network. To correct the primary cause of the alarm, check the messages to the system. The secondary cause of the alarm is system or processing thread congestion. To correct the secondary cause of the alarm, check process and system resources.

Multiple Readers and Multiple Writers Throttle Queue Depth Reached—System (15)

The Multiple Readers and Multiple Writers Throttle Queue Depth Reached alarm (major) indicates that the MRMW queue has reached the throttle depth. The primary cause of the alarm is that inbound network messages arriving at a rate much higher than processing capacity. To correct the primary cause of the alarm, determine the cause of increase in inbound network traffic, and try to control the traffic externally. The secondary cause of the alarm is that there is resource congestion resulting in a slowdown in processing messages from the queue. To correct the secondary cause of the alarm, check the platform CPU utilization, IPC queue depths, and overall availability of system resources.