Cisco BTS 10200 Softswitch Troubleshooting Guide, Release 7.0

Introduction
Audit Events and Alarms
- Audit (1)
- Audit (2)
- Audit (3)
- Audit (4)
- Audit (5)
- Audit (6)
- Audit (7)
- Audit (8)
- Audit (9)
- Audit (10)
- Audit (11)
- Audit (12)
- Audit (13)
- Audit (14)
- Audit (15)
- Audit (16)
- Audit (17)
- Audit (18)
- Audit (19)
- Audit (20)
- Audit (21)
- Audit (22)
- Audit (23)
- Audit (24)
- Audit (25)
Monitoring Audit Events
Troubleshooting Audit Alarms

Audit Troubleshooting

Revised: July 2010, OL-23033-01

Introduction

This chapter provides the information needed for monitoring and troubleshooting audit events and alarms. This chapter is divided into the following sections:

•Audit Events and Alarms—Provides a brief overview of each audit event and alarm

•Monitoring Audit Events—Provides the information needed for monitoring and correcting the audit events

•Troubleshooting Audit Alarms—Provides the information needed for troubleshooting and correcting the audit alarms

Audit Events and Alarms

This section provides a brief overview of the audit events and alarms for the Cisco BTS 10200 Softswitch; the event and alarms are arranged in numerical order. Table 2-1 lists all of the audit events and alarms by severity.

Note Refer to the "Obtaining Documentation and Submitting a Service Request" section on page l for detailed instructions on contacting Cisco Technical Assistance Center (TAC) and opening a service request.

Note Click the Audit message number in Table 2-1 to display information about the event or alarm.

Table 2-1 Audit Events and Alarms by Severity
Critical	Major	Minor	Warning	Information	Not Used
Audit (5)	Audit (6)	Audit (7)	Audit (3)	Audit (1)	Audit (9)
Audit (11)	Audit (12)	Audit (13)	Audit (4)	Audit (2)
Audit (15)	Audit (17)	Audit (16)	Audit (8)	Audit (10)
Audit (18)	Audit (20)		Audit (14)	Audit (21)
	Audit (25)		Audit (19)	Audit (22)
			Audit (23)
			Audit (24)

Audit (1)

Table 2-2 lists the details of the Audit (1) informational event. For additional information, refer to the "Test Report—Audit (1)" section.

Table 2-2 Audit (1) Details
Description	Test Report
Severity	Information
Threshold	100
Throttle	0
Primary Cause	This event is used for testing the new audit category.
Primary Action	No action is necessary.

Audit (2)

Table 2-3 lists the details of the Audit (2) informational event. For additional information, refer to the "Start or Stop of Signaling System 7—Circuit Identification Code Audit—Audit (2)" section.

Table 2-3 Audit (2) Details
Description	Start or Stop of Signaling System 7-Circuit Identification Code Audit (Start or Stop of SS7-CIC Audit)
Severity	Information
Threshold	100
Throttle	0
Datawords	Type of Audit—STRING [64]
Primary Cause	The Signaling System 7 (SS7) circuit identification code (CIC) audit has started or stopped.
Primary Action	No action required. This is normal operation.

Audit (3)

Table 2-4 lists the details of the Audit (3) warning event. To monitor and correct the cause of the event, refer to the "Signaling System 7 Circuit Identification Code Audit Terminated Before Successful Completion—Audit (3)" section.

Table 2-4 Audit (3) Details
Description	Signaling System 7 Circuit Identification Code Audit Terminated Before Successful Completion (SS7 CIC Audit Terminated Before Successful Completion)
Severity	Warning
Threshold	100
Throttle	0
Datawords	Type of Audit—STRING [64]
Primary Cause	A higher priority SS7 CIC audit interrupted and terminated a lower priority SS7 CIC audit.
Primary Action	Do not schedule an SS7 remote termination audit to occur while a periodic SS7 local termination audit is executing.

Audit (4)

Table 2-5 lists the details of the Audit (4) warning event. To monitor and correct the cause of the event, refer to the "Call Exceeds a Long-Duration Threshold—Audit (4)" section

Table 2-5 Audit (4) Details
Description	Call Exceeds a Long-Duration Threshold
Severity	Warning
Threshold	100
Throttle	0
Datawords	Trunk group number—TWO_BYTES Trunk member number—TWO_BYTES Current long-duration threshold—TWO_BYTES
Primary Cause	A call exceeded the current system long-duration threshold.
Primary Action	If there is a reason to believe the call is no longer valid, release the associated trunk facility.

Audit (5)

Table 2-6 lists the details of the Audit (5) critical alarm. To troubleshoot and correct the cause of the alarm, refer to the "Critical Internal Audit Failure—Audit (5)" section.

Table 2-6 Audit (5) Details
Description	Critical Internal Audit Failure
Severity	Critical
Threshold	100
Throttle	0
Datawords	Failure Details—STRING [220] Probable Causes—STRING [80] Corrective Actions—STRING [80]
Primary Cause	See the data field.
Primary Action	See the data field.

Audit (6)

Table 2-7 lists the details of the Audit (6) major alarm. To troubleshoot and correct the cause of the alarm, refer to the "Major Internal Audit Failure—Audit (6)" section.

Table 2-7 Audit (6) Details
Description	Major Internal Audit Failure
Severity	Major
Threshold	100
Throttle	0
Datawords	Failure Details—STRING [220] Probable Causes—STRING [80] Corrective Actions—STRING [80]
Primary Cause	See the data field.
Primary Action	See the data field.

Audit (7)

Table 2-8 lists the details of the Audit (7) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Minor Internal Audit Failure—Audit (7)" section.

Table 2-8 Audit (7) Details
Description	Minor Internal Audit Failure
Severity	Minor
Threshold	100
Throttle	0
Datawords	Failure Details—STRING [220] Probable Causes—STRING [80] Corrective Actions—STRING [80]
Primary Cause	See the data field.
Primary Action	See the data field.

Audit (8)

Table 2-9 lists the details of the Audit (8) warning event. To monitor and correct the cause of the event, refer to the "Warning From Internal Audit—Audit (8)" section.

Table 2-9 Audit (8) Details
Description	Warning From Internal Audit
Severity	Warning
Threshold	100
Throttle	0
Datawords	Failure Details—STRING [220] Probable Causes—STRING [80] Corrective Actions—STRING [80]
Primary Cause	See the data field.
Primary Action	See the data field.

Audit (9)

Audit (9) is not used.

Audit (10)

Table 2-10 lists the details of the Audit (10) information event. For additional information, refer to the "Call Data Audit Complete—Audit (10)" section.

Table 2-10 Audit (10) Details
Description	Call Data Audit Complete
Severity	Information
Threshold	100
Throttle	0
Datawords	Audit Information—STRING [256]
Primary Cause	A memory audit has been completed.
Primary Action	Check if any call blocks are freed as a result of the audit. An investigation to determine the root cause may be useful.

Audit (11)

Table 2-11 lists the details of the Audit (11) critical alarm. To troubleshoot and correct the cause of the alarm, refer to the "Critical Network Time Protocol Service Failure—Audit (11)" section.

Table 2-11 Audit (11) Details
Description	Critical Network Time Protocol Service Failure (Critical NTP Service Failure)
Severity	Critical
Threshold	100
Throttle	0
Datawords	Failure Details—STRING [220] Probable Causes—STRING [80] Corrective Actions—STRING [80]
Primary Cause	See the data field.
Primary Action	See the data field.

Audit (12)

Table 2-12 lists the details of the Audit (12) major alarm. To troubleshoot and correct the cause of the alarm, refer to the "Major Network Time Protocol Service Failure—Audit (12)" section.

Table 2-12 Audit (12) Details
Description	Major Network Time Protocol Service Failure (Major NTP Service Failure)
Severity	Major
Threshold	100
Throttle	0
Datawords	Failure Details—STRING [220] Probable Causes—STRING [80] Corrective Actions—STRING [80]
Primary Cause	See the data field.
Primary Action	See the data field.

Audit (13)

Table 2-13 lists the details of the Audit (13) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Minor Network Time Protocol Service Failure—Audit (13)" section.

Table 2-13 Audit (13) Details
Description	Minor Network Time Protocol Service Failure (Minor NTP Service Failure)
Severity	Minor
Threshold	100
Throttle	0
Datawords	Failure Details—STRING [220] Probable Causes—STRING [80] Corrective Actions—STRING [80]
Primary Cause	See the data field.
Primary Action	See the data field.

Audit (14)

Table 2-14 lists the details of the Audit (14) warning event. To monitor and correct the cause of the event, refer to the "Network Time Protocol Service Warning—Audit (14)" section.

Table 2-14 Audit (14) Details
Description	Network Time Protocol Service Warning (NTP Service Warning)
Severity	Warning
Threshold	100
Throttle	0
Datawords	Failure Details—STRING [220] Probable Causes—STRING [80] Corrective Actions—STRING [80]
Primary Cause	See the data field.
Primary Action	See the data field.

Audit (15)

Table 2-15 lists the Audit (15) critical alarm details. To troubleshoot and correct the cause of the alarm, refer to the "Critical Index Shared Memory Error—Audit (15)" section.

Table 2-15 Audit (15) Details
Description	Critical Index Shared Memory Error (Critical IDX Shared Memory Error)
Severity	Critical
Threshold	100
Throttle	0
Datawords	Failure Details—STRING [220] Probable Causes—STRING [80] Corrective Actions—STRING [80]
Primary Cause	See the data field.
Primary Action	See the data field.

Audit (16)

Table 2-16 lists the details of the Audit (16) minor alarm. To troubleshoot and correct the cause of the alarm, refer to the "Process Heap Memory Usage Exceeds Minor Threshold Level—Audit (16)" section.

Table 2-16 Audit (16) Details
Description	Process Heap Memory Usage Exceeds Minor Threshold Level
Severity	Minor
Threshold	100
Throttle	0
Datawords	Process Name—STRING [10] Heap Size in KB—FOUR_BYTES Heap Limit in KB—FOUR_BYTES Heap Usage Percentage—FOUR_BYTES Threshold Level Percentage—FOUR_BYTES
Primary Cause	Increase in heap usage has occurred due to high call traffic volume or maintenance operation.
Primary Action	Monitor the heap usage frequently and see whether it is approaching the major threshold level.

Audit (17)

Table 2-17 lists the details of the Audit (17) major alarm. To troubleshoot and correct the cause of the alarm, refer to the "Process Heap Memory Usage Exceeds Major Threshold Level—Audit (17)" section.

Table 2-17 Audit (17) Details
Description	Process Heap Memory Usage Exceeds Major Threshold Level
Severity	Major
Threshold	100
Throttle	0
Datawords	Process Name—STRING [10] Heap Size in KB—FOUR_BYTES Heap Limit in KB—FOUR_BYTES Heap Usage Percentage—FOUR_BYTES Threshold Level Percentage—FOUR_BYTES
Primary Cause	Increase in heap usage has occurred due to high call traffic volume, maintenance operation, or software problem.
Primary Action	Schedule a switchover during a maintenance window.

Audit (18)

Table 2-18 lists the details of the Audit (18) critical alarm. To troubleshoot and correct the cause of the alarm, refer to the "Process Heap Memory Usage Exceeds Critical Threshold Level—Audit (18)" section.

Table 2-18 Audit (18) Details
Description	Process Heap Memory Usage Exceeds Critical Threshold Level
Severity	Critical
Threshold	100
Throttle	0
Datawords	Process Name—STRING [10] Heap Size in KB—FOUR_BYTES Heap Limit in KB—FOUR_BYTES Heap Usage Percentage—FOUR_BYTES Threshold Level Percentage—FOUR_BYTES
Primary Cause	Increase in heap usage has occurred due to high call traffic volume, maintenance operation, or software problem.
Primary Action	Schedule a switchover during a maintenance window as soon as possible.

Audit (19)

Table 2-19 lists the details of the Audit (19) warning event. To monitor and correct the cause of the event, refer to the "Recovered Memory of Stale Call—Audit (19)" section.

Table 2-19 Audit (19) Details
Description	Recovered Memory of Stale Call
Severity	Warning
Threshold	20
Throttle	0
Datawords	Stale Memory Release Info—STRING [128]
Primary Cause	A loss of communication with originating or terminating side has occurred.
Primary Action	Check to see if adjacent network element is up and has a proper communication link with the Cisco BTS 10200.
Secondary Cause	Adjacent network device protocol error has occurred.
Secondary Action	Check the adjacent network device protocol compatibility.
Ternary Cause	An internal software error has occurred.
Ternary Action	Contact Cisco TAC.

Audit (20)

Table 2-20 lists the details of the Audit (20) major alarm. To troubleshoot and correct the cause of the alarm, refer to the "Audit Found Lost Call Data Record—Audit (20)" section.

Table 2-20 Audit (20) Details
Description	Audit Found Lost Call Data Record
Severity	Major
Threshold	20
Throttle	0
Datawords	Error Text—STRING [200]
Primary Cause	A software error has occurred. However, the orphaned records are recovered on detection 2d.
Primary Action	Contact Cisco TAC.

Audit (21)

Table 2-21 lists the details of the Audit (21) informational event. To monitor and correct the cause of the event, refer to the "Quality of Service Gate Memory Audit Complete—Audit (21)" section.

Table 2-21 Audit (21) Details
Description	Quality of Service Gate Memory Audit Complete (QoS Gate Memory Audit Complete)
Severity	Information
Threshold	100
Throttle	0
Datawords	Num Records Audited—FOUR_BYTES Audit Start Time—STRING [64]
Primary Cause	A gate memory audit has been completed.
Primary Action	Check to see if any gate memory was freed as a result of the audit. An investigation to determine the root cause may be useful.

Audit (22)

Table 2-22 lists the details of the Audit (22) informational event. To monitor and correct the cause of the event, refer to the "Quality of Service Gate Status Audit Complete—Audit (22)" section.

Table 2-22 Audit (22) Details
Description	Quality of Service Gate Status Audit Complete (QoS Gate Status Audit Complete)
Severity	Information
Threshold	100
Throttle	0
Datawords	Num Records Audited—FOUR_BYTES Audit Start Time—STRING [64]
Primary Cause	A gate status audit has been completed.
Primary Action	Check to see if any gate is removed from the cable modem termination system (CMTS) before the connection is released.

Audit (23)

Table 2-23 lists the details of the Audit (23) warning event. To monitor and correct the cause of the event, refer to the "Recover Memory of Dangling Gate—Audit (23)" section.

Table 2-23 Audit (23) Details
Description	Recover Memory of Dangling Gate
Severity	Warning
Threshold	100
Throttle	0
Datawords	Recovered Gate IDX—EIGHT_BYTES
Primary Cause	A software error has occurred.
Primary Action	If situation persists, contact Cisco TAC.

Audit (24)

Table 2-24 lists the details of the Audit (24) warning event. To monitor and correct the cause of the event, refer to the "No Gate in the Cable Modem Termination System for Active Connection—Audit (24)" section.

Table 2-24 Audit (24) Details
Description	No Gate in the Cable Modem Termination System for Active Connection (No Gate in CMTS for Active Connection)
Severity	Warning
Threshold	100
Throttle	0
Datawords	AGGR ID—STRING [16] Subscriber IP Address—STRING [32] Gate Direction—STRING [16]
Primary Cause	A communication error between the packet cable network components has occurred.
Primary Action	If situation persists, contact Cisco TAC.

Audit (25)

Table 2-25 lists the details of the Audit (25) major alarm. To troubleshoot and correct the cause of the alarm, refer to the "Core File Present—Audit (25)" section.

Table 2-25 Audit (25) Details
Description	Core File Present
Severity	Major
Threshold	100
Throttle	0
Datawords	Name of Host Machine—STRING [32] Directory Containing Core Files—STRING [128] Number of Core Files From 0 to 1—FOUR_BYTES Number of Core Files From 1 to 2—FOUR_BYTES Number of Core Files Greater Than 2—FOUR_BYTES Remaining Free File Space in MB—FOUR_BYTES
Primary Cause	A network element process has crashed.
Primary Action	Move the core file to a file server.

Monitoring Audit Events

This section provides the information you need to monitor and correct audit events. Table 2-26 lists all of the audit events in numerical order and provides cross-references to the subsections in this section.

Note Refer to the "Obtaining Documentation and Submitting a Service Request" section on page l for detailed instructions on contacting Cisco TAC and opening a service request.

Table 2-26 Cisco BTS 10200 Audit Events
Event Type	Event Name	Event Severity
Audit (1)	Test Report—Audit (1)	Information
Audit (2)	Start or Stop of Signaling System 7—Circuit Identification Code Audit—Audit (2)	Information
Audit (3)	Signaling System 7 Circuit Identification Code Audit Terminated Before Successful Completion—Audit (3)	Warning
Audit (4)	Call Exceeds a Long-Duration Threshold—Audit (4)	Warning
Audit (5)	Critical Internal Audit Failure—Audit (5)	Critical
Audit (6)	Major Internal Audit Failure—Audit (6)	Major
Audit (7)	Minor Internal Audit Failure—Audit (7)	Minor
Audit (8)	Warning From Internal Audit—Audit (8)	Warning
Audit (10)	Call Data Audit Complete—Audit (10)	Information
Audit (11)	Critical Network Time Protocol Service Failure—Audit (11)	Critical
Audit (12)	Major Network Time Protocol Service Failure—Audit (12)	Major
Audit (13)	Minor Network Time Protocol Service Failure—Audit (13)	Minor
Audit (14)	Network Time Protocol Service Warning—Audit (14)	Warning
Audit (15)	Critical Index Shared Memory Error—Audit (15)	Critical
Audit (16)	Process Heap Memory Usage Exceeds Minor Threshold Level—Audit (16)	Minor
Audit (17)	Process Heap Memory Usage Exceeds Major Threshold Level—Audit (17)	Major
Audit (18)	Process Heap Memory Usage Exceeds Critical Threshold Level—Audit (18)	Critical
Audit (19)	Recovered Memory of Stale Call—Audit (19)	Warning
Audit (20)	Audit Found Lost Call Data Record—Audit (20)	Major
Audit (21)	Quality of Service Gate Memory Audit Complete—Audit (21)	Information
Audit (22)	Quality of Service Gate Status Audit Complete—Audit (22)	Information
Audit (23)	Recover Memory of Dangling Gate—Audit (23)	Warning
Audit (24)	No Gate in the Cable Modem Termination System for Active Connection—Audit (24)	Warning
Audit (25)	Core File Present—Audit (25)	Major

Test Report—Audit (1)

The Test Report event is used for testing the audit event category. The event is informational and no further action is required.

Start or Stop of Signaling System 7—Circuit Identification Code Audit—Audit (2)

The Start or Stop of Signaling System 7—Circuit Identification Code Audit event occurs as part of normal Cisco BTS 10200 operation. The event is informational and no further action is required.

Signaling System 7 Circuit Identification Code Audit Terminated Before Successful Completion—Audit (3)

The Signaling System 7 Circuit Identification Code Audit Terminated Before Successful Completion event serves as a warning that a higher priority SS7 CIC audit has interrupted and terminated a lower priority SS7 CIC audit. To control the occurrence of this event, an SS7 remote termination audit should not be scheduled to occur while a periodic SS7 local termination audit is executing.

Call Exceeds a Long-Duration Threshold—Audit (4)

The Call Exceeds a Long-Duration Threshold event serves as a warning that a call has exceeded the current Cisco BTS 10200 system long-duration threshold. If there is reason to believe the call is no longer valid, the associated trunk facility should be released.

To check the current threshold setting for Long Duration calls, proceed as follows:

Step 1 As root user from the Call Agent (CA), execute the following command:

# grep LongDuration /opt/OptiCall/`ls /opt/OptiCall | grep CA`/bin/platform.cfg

Step 2 Review the command results.

Sample results:

Args=-ems_pri_dn blg-asys07EMS.mssol.cisco.com -ems_sec_dn blg-asys07EMS.mssol.cisco.com 
-port 15260

-QCheckInterval1 1000 -QCheckInterval2 4500 -RecordGenTime 00:00:00 -LongDurationAllowance 
1440

-QCheckInterval3 60 -MyCaBillingDn blg-asys07CA.mssol.cisco.com

Note Following a platform start, the billing system will wait until the present time (current time on the call agent) is equal to RecordGenTime before auditing for Long Duration Calls. After the initial audit, the billing system will perform a Long Duration Call audit every LongDurationAllowance minutes. During an audit, the billing system will generate a Long Duration CDR for calls that have been active for LongDurationAllowance minutes.

Critical Internal Audit Failure—Audit (5)

The Critical Internal Audit Failure alarm (critical) indicates that a critical internal audit failure has occurred. To troubleshoot and correct the cause of the Critical Internal Audit Failure alarm, refer to the "Critical Internal Audit Failure—Audit (5)" section.

Major Internal Audit Failure—Audit (6)

The Major Internal Audit Failure alarm (major) indicates that a major internal audit failure has occurred. To troubleshoot and correct the cause of the Major Internal Audit Failure alarm, refer to the "Major Internal Audit Failure—Audit (6)" section.

Minor Internal Audit Failure—Audit (7)

The Minor Internal Audit Failure alarm (minor) indicates that a minor internal audit failure has occurred. To troubleshoot and correct the cause of the Minor Internal Audit Failure alarm, refer to the "Minor Internal Audit Failure—Audit (7)" section.

Warning From Internal Audit—Audit (8)

The Warning From Internal Audit event serves as a warning that a problem with an internal audit has occurred. To correct the internal audit problem, refer to the failure details (220), probable cause (80), and corrective actions (80) listed in the data field. Additionally, refer to the previous critical, major, and minor internal audit failure sections.

Call Data Audit Complete—Audit (10)

The Call Data Audit Complete event serves as an informational alert that a call data memory audit has been completed. The call data memory audit information should be checked to see if any call blocks have been cleared as a result of the audit.

Critical Network Time Protocol Service Failure—Audit (11)

The Critical Network Time Protocol Service Failure alarm (critical) indicates that a critical Network Time Protocol (NTP) service failure has occurred. To troubleshoot and correct the cause of the Critical Network Time Protocol Service Failure alarm, refer to the "Critical Network Time Protocol Service Failure—Audit (11)" section.

Major Network Time Protocol Service Failure—Audit (12)

The Major Network Time Protocol Service Failure alarm (major) indicates that a major NTP service failure has occurred. To troubleshoot and correct the cause of the Major Network Time Protocol Service Failure alarm, refer to the "Major Network Time Protocol Service Failure—Audit (12)" section.

Minor Network Time Protocol Service Failure—Audit (13)

The Minor Network Time Protocol Service Failure alarm (minor) indicates that a minor NTP service failure has occurred. To troubleshoot and correct the cause of the Minor Network Time Protocol Service Failure alarm, refer to the "Minor Network Time Protocol Service Failure—Audit (13)" section.

Network Time Protocol Service Warning—Audit (14)

The Network Time Protocol Service Warning event serves as a warning that a problem with an NTP service has occurred. To correct the NTP service problem refer to the failure details (220), probable cause (80), and corrective actions (80) listed in the data field. Additionally, refer to the previous critical, major, and minor NTP service warning sections.

To gather additional troubleshooting information, use the following:

From EMS:

show ems

From UNIX prompt:

/opt/BTSxntp/bin/ntpq -c peers

/opt/BTSxntp/bin/ntpq -c lpeers

/opt/BTSxntp/bin/ntpq -c lopeers

/opt/BTSxntp/bin/ntpq -c opeers

cat /etc/inet/ntp.conf

Critical Index Shared Memory Error—Audit (15)

The Critical Index Shared Memory Error alarm (critical) indicates that a critical shared memory index (IDX) error has occurred. To troubleshoot and correct the cause of the Critical Index Shared Memory Error alarm, refer to the "Critical Index Shared Memory Error—Audit (15)" section.

Process Heap Memory Usage Exceeds Minor Threshold Level—Audit (16)

The Process Heap Memory Usage Exceeds Minor Threshold Level alarm (minor) indicates that a process heap memory usage minor threshold level crossing has occurred. To troubleshoot and correct the cause of the Process Heap Memory Usage Exceeds Minor Threshold Level alarm, refer to the "Process Heap Memory Usage Exceeds Minor Threshold Level—Audit (16)" section.

Process Heap Memory Usage Exceeds Major Threshold Level—Audit (17)

The Process Heap Memory Usage Exceeds Major Threshold Level alarm (major) indicates that a process heap memory usage major threshold level crossing has occurred. To troubleshoot and correct the cause of the Process Heap Memory Usage Exceeds Major Threshold Level alarm, refer to the "Process Heap Memory Usage Exceeds Major Threshold Level—Audit (17)" section.

Process Heap Memory Usage Exceeds Critical Threshold Level—Audit (18)

The Process Heap Memory Usage Exceeds Critical Threshold Level alarm (critical) indicates that a process heap memory usage critical threshold level crossing has occurred. To troubleshoot and correct the cause of the Process Heap Memory Usage Exceeds Critical Threshold Level alarm, refer to the "Process Heap Memory Usage Exceeds Critical Threshold Level—Audit (18)" section.

Recovered Memory of Stale Call—Audit (19)

The Recovered Memory of Stale Call event serves as a warning that recovery of memory from a stale call has occurred. The primary cause of the warning is that a loss of communication with the originating or terminating side occurred. To correct the primary cause of the warning, check to see if the adjacent network element link is up and that the adjacent network element is properly communicating with the Cisco BTS 10200. The secondary cause of the warning is that an adjacent network device has a protocol error. To correct the secondary cause of the warning, check the adjacent network device protocol compatibility with the Cisco BTS 10200. The ternary cause of the warning is an internal software error. If a internal software error has occurred, contact Cisco TAC to obtain technical assistance. Prior to contacting Cisco TAC, collect a trace log corresponding to the time of the alarm.

Audit Found Lost Call Data Record—Audit (20)

The Audit Found Lost Call Data Record alarm (major) indicates that an audit process has found a lost call data record. To troubleshoot and correct the cause of the Audit Found Lost Call Data Record alarm, refer to the "Audit Found Lost Call Data Record—Audit (20)" section.

Quality of Service Gate Memory Audit Complete—Audit (21)

The Quality of Service Gate Memory Audit Complete event serves as an informational alert that a quality of service gate memory audit has been completed. To correct the primary cause of the event, check to see if any gate memory was freed as a result of the audit. An investigation to determine the root cause may be useful.

Quality of Service Gate Status Audit Complete—Audit (22)

The Quality of Service Gate Status Audit Complete event serves as an informational alert that a quality of service gate status audit has been completed. To correct the primary cause of the event, check to see if any gate was removed in the CMTS before the connection is released.

Recover Memory of Dangling Gate—Audit (23)

The Recover Memory of Dangling Gate event serves as a warning that the memory recovery of a dangling gate has occurred. The primary cause of the warning is that a software error has occurred. If the situation persists, contact Cisco to obtain technical assistance.

Prior to contacting Cisco TAC, collect a Cisco BTS 10200 trace log corresponding to the time of the event and collect the following additional information from the CMTS.

show packetcable gate summary

show packetcable gate <gate id>

No Gate in the Cable Modem Termination System for Active Connection—Audit (24)

The No Gate in the Cable Modem Termination System for Active Connection event serves as a warning that no gate in the CMTS is available for the active connection. The primary cause of the warning is that a communication error has occurred between packet cable network components. If the situation persists, contact Cisco to obtain technical assistance.

Prior to contacting Cisco TAC, collect a Cisco BTS 10200 trace log corresponding to the time of the event and collect the following additional information from the CMTS.

show packetcable global

show packetcable gate summary

Core File Present—Audit (25)

The Core File Present alarm (major) indicates that a network element process has crashed. To troubleshoot and correct the cause of the Core File Present alarm, refer to the "Core File Present—Audit (25)" section.

Troubleshooting Audit Alarms

This section provides the information you need to troubleshoot and correct audit alarms. Table 2-27 lists all of the audit alarms in numerical order and provides cross-references to the subsections in this section.

Note Refer to the "Obtaining Documentation and Submitting a Service Request" section on page l for detailed instructions on contacting Cisco TAC and opening a service request.

Table 2-27 Cisco BTS 10200 Audit Alarms
Alarm Type	Alarm Name	Alarm Severity
Audit (5)	Critical Internal Audit Failure—Audit (5)	Critical
Audit (6)	Major Internal Audit Failure—Audit (6)	Major
Audit (7)	Minor Internal Audit Failure—Audit (7)	Minor
Audit (11)	Critical Network Time Protocol Service Failure—Audit (11)	Critical
Audit (12)	Major Network Time Protocol Service Failure—Audit (12)	Major
Audit (13)	Minor Network Time Protocol Service Failure—Audit (13)	Minor
Audit (15)	Critical Index Shared Memory Error—Audit (15)	Critical
Audit (16)	Process Heap Memory Usage Exceeds Minor Threshold Level—Audit (16)	Minor
Audit (17)	Process Heap Memory Usage Exceeds Major Threshold Level—Audit (17)	Major
Audit (18)	Process Heap Memory Usage Exceeds Critical Threshold Level—Audit (18)	Critical
Audit (20)	Audit Found Lost Call Data Record—Audit (20)	Major
Audit (25)	Core File Present—Audit (25)	Major

Critical Internal Audit Failure—Audit (5)

The Critical Internal Audit Failure alarm (critical) indicates that a critical internal audit failure has occurred. To find the probable causes of the alarm, review the Failure Details data field and Probable Causes data field datawords. For the corrective actions for the alarm, review the Corrective Actions data field dataword and take the corrective actions listed.

Major Internal Audit Failure—Audit (6)

The Major Internal Audit Failure alarm (major) indicates that a major internal audit failure has occurred. To find the probable causes of the alarm, review the data field Failure Details and Probable Causes datawords. For the corrective actions for the alarm, review the data field Corrective Actions dataword and proceed with the corrective actions listed.

Minor Internal Audit Failure—Audit (7)

The Minor Internal Audit Failure alarm (minor) indicates that a minor internal audit failure has occurred. To find the probable causes of the alarm, review the data field Failure Details and Probable Causes datawords. For the corrective actions for the alarm, review the data field Corrective Actions dataword. Once the Corrective Actions dataword has been reviewed, proceed with the corrective actions listed.

Critical Network Time Protocol Service Failure—Audit (11)

The Critical Network Time Protocol Service Failure alarm (critical) indicates that a critical NTP service failure has occurred. To find the probable causes of the alarm, review the data field Failure Details and Probable Causes datawords. For the corrective actions for the alarm, review the data field Corrective Actions dataword and take the corrective actions listed.

Use the following command to gather additional troubleshooting information from the EMS.

show ems

From the UNIX prompt, use the following commands to gather additional troubleshooting information from the EMS.

/opt/BTSxntp/bin/ntpq -c peers

/opt/BTSxntp/bin/ntpq -c lpeers

/opt/BTSxntp/bin/ntpq -c lopeers

/opt/BTSxntp/bin/ntpq -c opeers

cat /etc/inet/ntp.conf

Major Network Time Protocol Service Failure—Audit (12)

The Major Network Time Protocol Service Failure alarm (major) indicates that a major NTP service failure has occurred. To find the probable causes of the alarm, review the data field Failure Details and Probable Causes datawords. For the corrective actions for the alarm, review the data field Corrective Actions dataword and proceed with the corrective actions listed.

Use the following command to gather additional troubleshooting information from the EMS.

show ems

From the UNIX prompt, use the following commands to gather additional troubleshooting information from the EMS.

/opt/BTSxntp/bin/ntpq -c peers

/opt/BTSxntp/bin/ntpq -c lpeers

/opt/BTSxntp/bin/ntpq -c lopeers

/opt/BTSxntp/bin/ntpq -c opeers

cat /etc/inet/ntp.conf

Minor Network Time Protocol Service Failure—Audit (13)

The Minor Network Time Protocol Service Failure alarm (minor) indicates that a minor NTP service failure has occurred. To find the probable causes of the alarm, review the data field Failure Details and Probable Causes datawords. For the corrective actions for the alarm, review the data field Corrective Actions dataword and proceed with the corrective actions listed.

To gather additional troubleshooting information, use the following:

Use the following command to gather additional troubleshooting information from the EMS.

show ems

From the UNIX prompt, use the following commands to gather additional troubleshooting information from the EMS.

/opt/BTSxntp/bin/ntpq -c peers

/opt/BTSxntp/bin/ntpq -c lpeers

/opt/BTSxntp/bin/ntpq -c lopeers

/opt/BTSxntp/bin/ntpq -c opeers

cat /etc/inet/ntp.conf

Critical Index Shared Memory Error—Audit (15)

The Critical Index Shared Memory Error alarm (critical) indicates that a critical shared memory index error has occurred. To find the probable causes of the alarm, review the data field Failure Details and Probable Causes datawords. For the corrective actions for the alarm, review the data field Corrective Actions dataword and proceed with the corrective actions listed.

Process Heap Memory Usage Exceeds Minor Threshold Level—Audit (16)

The Process Heap Memory Usage Exceeds Minor Threshold Level alarm (minor) indicates that a process heap memory usage minor threshold level crossing has occurred. The primary cause of the alarm is that the increase in heap usage is due to a high call volume of traffic or a maintenance operation. Monitor the heap usage and check to see if it is approaching the major threshold level.

From UNIX prompt, use the following commands to show the process heap memory usage.

date

show_heap <pid of mga> > /opt/mga_heap.txt

The show_heap command is not on the system by default. To obtain the show_heap tool, contact Cisco TAC.

Note Heap memory usage is automatically monitored once per hour.

Process Heap Memory Usage Exceeds Major Threshold Level—Audit (17)

The Process Heap Memory Usage Exceeds Major Threshold Level alarm (major) indicates that a process heap memory usage major threshold level crossing has occurred. The primary cause of the increase in heap usage is a high call volume of traffic, a maintenance operation, or a software problem. To isolate and correct the primary cause of the alarm, schedule a switchover during a maintenance window.

From UNIX prompt, use the following commands to show the process heap memory usage.

date

show_heap <pid of mga> > /opt/mga_heap.txt

The show_heap command is not on the system by default. To obtain the show_heap tool, contact Cisco TAC.

Note Heap memory usage is automatically monitored once per hour.

Process Heap Memory Usage Exceeds Critical Threshold Level—Audit (18)

The Process Heap Memory Usage Exceeds Critical Threshold Level alarm (critical) indicates that a process heap memory usage critical threshold level crossing has occurred. The primary cause of the alarm is that the increase in heap usage is due to a high call volume of traffic, a maintenance operation, or a software problem. To isolate and correct the primary cause of the alarm, schedule a switchover during a maintenance window as soon as possible.

From the UNIX, prompt use the following commands to show the process heap memory usage,

date

show_heap <pid of mga> > /opt/mga_heap.txt

The show_heap command is not on the system by default. To obtain the show_heap tool contact Cisco TAC.

Note Heap memory usage is automatically monitored once per hour.

Audit Found Lost Call Data Record—Audit (20)

The Audit Found Lost Call Data Record alarm (major) indicates that an audit process has found a lost call data record. The primary cause of the alarm is that a software error has occurred. However, the orphaned records are recovered on detection 2d. To correct the primary cause of the alarm, contact Cisco TAC. Prior to contacting Cisco TAC, collect a trace log corresponding to the time of the alarm.

Core File Present—Audit (25)

The Core File Present alarm (major) indicates that a network element process has crashed and that the Cisco BTS 10200 system has created a core file to assist in determining the root cause of the network element process crash. To correct the primary cause of the alarm, move the core file to a file server.

Note Although the Cisco BTS 10200 software directs its core files to the directory /bin, core files generated by software that is not Cisco BTS 10200 software (such as the platform operating system), are stored in the /opt/core directory. When a core dump occurs that is not generated by the Cisco BTS 10200, the Cisco BTS 10200 issues an Audit 25 alarm, as it does for Cisco BTS 10200 core dumps, to indicate that such an incident occurred.

To move the core file to an FTP server, run the following commands:

pstack <corefile>

ftp core file to ftp-sj.cisco.com/incoming

Once the core files is moved to the file server (FTP server), contact Cisco TAC.

Bias-Free Language

Results

Chapter: Chapter 2 - Audit Troubleshooting

Audit Troubleshooting

Introduction

Audit Events and Alarms

Audit (1)

Audit (2)

Audit (3)

Audit (4)

Audit (5)

Audit (6)

Audit (7)

Audit (8)

Audit (9)

Audit (10)

Audit (11)

Audit (12)

Audit (13)

Audit (14)

Audit (15)

Audit (16)

Audit (17)

Audit (18)

Audit (19)

Audit (20)

Audit (21)

Audit (22)

Audit (23)

Audit (24)

Audit (25)

Monitoring Audit Events

Test Report—Audit (1)

Start or Stop of Signaling System 7—Circuit Identification Code Audit—Audit (2)

Signaling System 7 Circuit Identification Code Audit Terminated Before Successful Completion—Audit (3)

Call Exceeds a Long-Duration Threshold—Audit (4)

Critical Internal Audit Failure—Audit (5)

Major Internal Audit Failure—Audit (6)

Minor Internal Audit Failure—Audit (7)

Warning From Internal Audit—Audit (8)

Call Data Audit Complete—Audit (10)

Critical Network Time Protocol Service Failure—Audit (11)

Major Network Time Protocol Service Failure—Audit (12)

Minor Network Time Protocol Service Failure—Audit (13)

Network Time Protocol Service Warning—Audit (14)

Critical Index Shared Memory Error—Audit (15)

Process Heap Memory Usage Exceeds Minor Threshold Level—Audit (16)

Process Heap Memory Usage Exceeds Major Threshold Level—Audit (17)

Process Heap Memory Usage Exceeds Critical Threshold Level—Audit (18)

Recovered Memory of Stale Call—Audit (19)

Audit Found Lost Call Data Record—Audit (20)

Quality of Service Gate Memory Audit Complete—Audit (21)

Quality of Service Gate Status Audit Complete—Audit (22)

Recover Memory of Dangling Gate—Audit (23)

No Gate in the Cable Modem Termination System for Active Connection—Audit (24)

Core File Present—Audit (25)

Troubleshooting Audit Alarms

Critical Internal Audit Failure—Audit (5)

Major Internal Audit Failure—Audit (6)

Minor Internal Audit Failure—Audit (7)

Critical Network Time Protocol Service Failure—Audit (11)

Major Network Time Protocol Service Failure—Audit (12)

Minor Network Time Protocol Service Failure—Audit (13)

Critical Index Shared Memory Error—Audit (15)

Process Heap Memory Usage Exceeds Minor Threshold Level—Audit (16)

Process Heap Memory Usage Exceeds Major Threshold Level—Audit (17)

Process Heap Memory Usage Exceeds Critical Threshold Level—Audit (18)

Audit Found Lost Call Data Record—Audit (20)

Core File Present—Audit (25)

Was this Document Helpful?

Contact Cisco