Runs a set of
diagnostics and displays the current state of the system. If any components are
not running, red failure messages are displayed.
Note
|
RADIUS-based policy control is no longer supported in CPS 14.0.0
and later releases as 3GPP Gx Diameter interface has become the
industry-standard policy control interface.
|
Syntax
/var/qps/bin/diag/diagnostics.sh -h
Usage: /var/qps/bin/diag/diagnostics.sh [options]
This script runs checks (i.e. diagnostics) against the various access, monitoring, and configuration points of a running CPS system.
In HA/GR environments, the script always does a ping check for all VMs prior to any other checks and adds any that fail the ping test to the IGNORED_HOSTS variable. This helps reduce the possibility for script function errors.
NOTE: See /var/qps/bin/diag/diagnostics.ini to disable certain checks for the HA/GR env persistently. The use of a flag will override the diagnostics.ini value.
Examples:
/var/qps/bin/diag/diagnostics.sh -q
/var/qps/bin/diag/diagnostics.sh --basic_ports --clock_skew -v --ignored_hosts='portal01,portal02'
Options:
--basic_ports : Run basic port checks
For HA/GR: 80, 11211, 7070, 8080, 8081, 8090, 8182, 9091, 9092, and Mongo DB ports based on /etc/broadhop/mongoConfig.cfg
--clock_skew : Check clock skew between lb01 and all vms (Multi-Node Environment only)
--diskspace : Check diskspace
--get_active_alarms : Get the active alarms in the CPS
--get_frag_status : Get fragmentation status for Primary members of DBs viz. session_cache, sk_cache, diameter, spr, and balance_mgmt.
--get_replica_status : Get the status of the replica-sets present in environment. (Multi-Node Environment only)
--get_shard_health : Get the status of the sharded database information present in environment. (Multi-Node Environment only)
--get_sharding_status : Get the status of the sharding information present in environment. (Multi-Node Environment only)
--get_session_shard_health : Get the session shard health status information present in environment. (Multi-Node Environment only).
--get_peer_status: Get the diameter peers present in the environemt.
--get_sharded_replica_status : Get the status of the shards present in environment. (Multi-Node Environment only)
--ha_proxy : Connect to HAProxy to check operation and performance statistics, and ports (Multi-Node Environment only)
http://lbvip01:5540/haproxy?stats
http://lbvip01:5540//haproxy-diam?stats
--help -h : Help - displays this help
--ignored_hosts : Ignore the comma separated list of hosts. For example --ignored_hosts='portal01,portal02'
Default is 'portal01,portal02,portallb01,portallb02' (Multi-Node Environment only)
--ping_check : Check ping status for all VM
--policy_revision_status : Check the policy revision status on all QNS,LB,UDC VMs.
--lwr_diagnostics : Retrieve diagnostics from CPS LWR kafka processes
--qns_diagnostics : Retrieve diagnostics from CPS java processes
--qns_login : Check qns user passwordless login
--quiet -q : Quiet output - display only failed diagnostics
--radius : Run radius specific checks
--redis : Run redis specific checks
--whisper : Run whisper specific checks
--aido : Run Aido specific checks
--svn : Check svn sync status between pcrfclient01 & pcrfclient02 (Multi-Node Environment only)
--tacacs : Check Tacacs server reachability
--swapspace : Check swap space
--verbose -v : Verbose output - display *all* diagnostics (by default, some are grouped for readability)
--virtual_ips : Ensure Virtual IP Addresses are operational (Multi-Node Environment only)
--vm_allocation : Ensure VM Memory and CPUs have been allocated according to recommendations
Note
|
If IPv6 address is more than 23 characters, due to the restriction in the column width, only 23 characters are displayed in
the diagnostics.sh --get_replica_status output for IPv6 address. The host name present in the diagnostics.sh --get_replica_status can be used to identify IP address.
|
The test for swap memory usage must have the following criteria :
-
The test passes if the swap space used is less than 200 MB.
-
The script issues a warning if the swap space used is between 200 MB and 1000 MB.
-
The status fails if the swap memory used exceeds 1000 MB.
Executable on
VMs
Cluster Manager and
OAM (pcrfclient) nodes
Example
[root@pcrfclient01 ~]# diagnostics.sh
QNS Diagnostics
Checking basic ports (80, 7070, 27017, 27717-27720, 27749, 8080, 9091)...[PASS]
Checking qns passwordless logins on all boxes...[PASS]
Validating hostnames...[PASS]
Checking disk space for all VMs...[PASS]
Checking swap space for all VMs...[PASS]
Checking for clock skew...[PASS]
Retrieving QNS diagnostics from qns01:9045...[PASS]
Retrieving QNS diagnostics from qns02:9045...[PASS]
Checking HAProxy status...[PASS]
Checking VM CPU and memory allocation for all VMs...[PASS]
Checking Virtual IPs are up...[PASS]
[root@pcrfclient01 ~]#
List of Active
Alarms
To get the list of
active alarms, execute the
diagnostics.sh --get_active_alarms
command. Here is a
sample output:
#diagnostics.sh --get_active_alarms
CPS Diagnostics HA Multi-Node Environment
---------------------------
Active Application Alarm Status
---------------------------------------------------------------------------------
id=1000 sub_id=3001 event_host=lb02 status=down date=2017-11-22,
10:47:34,051+0000 msg="3001:Host: site-host-gx Realm: site-gx-client.com is down"
id=1000 sub_id=3001 event_host=lb02 status=down date=2017-11-22,
10:47:34,048+0000 msg="3001:Host: site-host-sd Realm: site-sd-client.com is down"
id=1000 sub_id=3001 event_host=lb01 status=down date=2017-11-22,
10:45:17,927+0000 msg="3001:Host: site-server Realm: site-server.com is down"
id=1000 sub_id=3001 event_host=lb02 status=down date=2017-11-22,
10:47:34,091+0000 msg="3001:Host: site-host-rx Realm: site-rx-client.com is down"
id=1000 sub_id=3002 event_host=lb02 status=down date=2017-11-22,
10:47:34,111+0000 msg="3002:Realm: site-server.com:applicationId: 7:all peers are down"
Active Component Alarm Status
---------------------------------------------------------------------------------
event_host=lb02 name=ProcessDown severity=critical facility=operatingsystem
date=2017-22-11,10:13:49,310329511,+00:00 info=corosync process is down
Attention
|
-
Due to the
limitation of architecture of the CPS SNMP implementation, if the SNMP deamon
or policy server (QNS) process on pcrfclient VM restarts, there can be gap
between active alarms displayed by the
diagnostics.sh and active alarms in NMS.
-
The date
printed for application alarm status is when the alarm was seen at pcrfclient
VM. The time for the alarm at NMS is the time before the alarm is received from
Policy Director (LB) VM. So there can be a difference in the dates for the same
alarm reported in
diagnostics.sh and in NMS.
|
The following
table list the type of SNMP alarms:
Table 1. IDs - Type of
SNMP Alarms
Alarm ID
|
Type
|
1000
|
Application Alarm
|
7100
|
Database
Alarm
|
7200
|
Failover
Alarm
|
7300
|
Process
Alarm
|
7400
|
VM Alarm
|
7700
|
GR Alarm
|
For more information on SNMP alarms, refer to
CPS SNMP, Alarms and Clearing Procedures Guide.
Sample Output
of --get_sharding_status
diagnostics.sh --get_sharding_status
CPS Diagnostics HA Multi-Node Environment
---------------------------
|---------------------------------------------------------------------------------------------------------------------------|
| SHARDING STATUS INFORMATION Date : 2019-01-08 01:01:05 |
|---------------------------------------------------------------------------------------------------------------------------|
Shard Id Mongo DB State Backup DB Removed Session Count
1 sessionmgr01:27717/session_cache online false false 0
Rebalance Status: Rebalanced
|---------------------------------------------------------------------------------------------------------------------------|
Sample output of --policy_revision_status
diagnostics.sh --policy_revision_status
CPS Diagnostics HA Multi-Node Environment
---------------------------
Checking SVN revision status in qns nodes: [ qns01 qns02 qns03 qns04 qns05 qns06 qns07 qns08 qns09 qns10 ]
qns01(instance=1, head_revision=41, local_revision=41)...[PASS]
qns02(instance=1, head_revision=41, local_revision=41)...[PASS]
qns03(instance=1, head_revision=41, local_revision=41)...[PASS]
qns04(instance=1, head_revision=41, local_revision=41)...[PASS]
qns05(instance=1, head_revision=41, local_revision=41)...[PASS]
qns06(instance=1, head_revision=41, local_revision=41)...[PASS]
qns07(instance=1, head_revision=41, local_revision=41)...[PASS]
qns08(instance=1, head_revision=41, local_revision=41)...[PASS]
qns09(instance=1, head_revision=41, local_revision=41)...[PASS]
qns10(instance=1, head_revision=41, local_revision=41)...[PASS]
SVN revison status complete on: [ qns01 qns02 qns03 qns04 qns05 qns06 qns07 qns08 qns09 qns10 ]...[PASS]
Checking SVN revision status in lb nodes: [ lb01 lb02 ]
lb01(instance=1, head_revision=41, local_revision=41)...[PASS]
lb01(instance=2, head_revision=41, local_revision=41)...[PASS]
lb01(instance=3, head_revision=41, local_revision=41)...[PASS]
lb01(instance=4, head_revision=41, local_revision=41)...[PASS]
lb02(instance=1, head_revision=41, local_revision=41)...[PASS]
lb02(instance=2, head_revision=41, local_revision=41)...[PASS]
lb02(instance=3, head_revision=41, local_revision=41)...[PASS]
lb02(instance=4, head_revision=41, local_revision=41)...[PASS]
SVN revison status complete on: [ lb01 lb02 ]...[PASS]
Sample output of --get_session_shard_health
diagnostics.sh
output on HA setup without any shard configuration.
diagnostics.sh --get_session_shard_health
CPS Diagnostics HA Multi-Node Environment
---------------------------
|----------------------------------------------------------------------------------------------------------------------------------------|
| Mongo:v3.4.16 SESSION SHARD HEALTH INFORMATION - SET TYPE : HA Date : 2019-01-11 03:15:26 |
|----------------------------------------------------------------------------------------------------------------------------------------|
| Total# of Session Cache Replica Set Found in mongoConfig.cfg : 2 |
| Total# of Shard Configured : 1 |
| Total# of Active Shards (replica-sets): 1 |
| Total# of HotStandby Shards ( replica-sets): 0 |
| Default shard Configured: 1 |
| Replica-sets not part of shard configuration: set07, ( STATUS : ERR ) |
|----------------------------------------------------------------------------------------------------------------------------------------|
| setname seed1 seed2 port shard# vm1_hostname vm2_hostname status |
|----------------------------------------------------------------------------------------------------------------------------------------|
| set01 sessionmgr01 sessionmgr02 27717 session_cache sessionmgr01:27717 sessionmgr02:27717 OK |
|----------------------------------------------------------------------------------------------------------------------------------------|
| Mongo:v3.4.16 SESSION SHARD BUCKET INFORMATION |
|----------------------------------------------------------------------------------------------------------------------------------------|
| { "_id" : { "shard" : 1 }, "count" : 8192 } { Status : OK } |
|----------------------------------------------------------------------------------------------------------------------------------------|
| Mongo:v3.4.16 SESSION SHARD INSTANCE VERSION INFORMATION |
|----------------------------------------------------------------------------------------------------------------------------------------|
| { "_id" : "qns02-1", "version" : 10 } { Status : OK } |
| { "_id" : "qns01-1", "version" : 10 } { Status : OK } |
| { "_id" : "qns05-1", "version" : 10 } { Status : OK } |
| { "_id" : "qns04-1", "version" : 10 } { Status : OK } |
| { "_id" : "qns08-1", "version" : 10 } { Status : OK } |
| { "_id" : "qns09-1", "version" : 10 } { Status : OK } |
| { "_id" : "qns10-1", "version" : 10 } { Status : OK } |
| { "_id" : "qns07-1", "version" : 10 } { Status : OK } |
| { "_id" : "qns06-1", "version" : 10 } { Status : OK } |
| { "_id" : "qns03-1", "version" : 10 } { Status : OK } |
|----------------------------------------------------------------------------------------------------------------------------------------|
diagnostics.sh
output on on GR/dual cluster setup with shard configuration.
diagnostics.sh --get_session_shard_health
CPS Diagnostics HA Multi-Node Environment
---------------------------
|----------------------------------------------------------------------------------------------------------------------------------------|
| Mongo:v3.4.16 SESSION SHARD HEALTH INFORMATION - SET TYPE : Geo - SITE_ID : NOT_FOUND Date : 2019-01-11 05:50:38 |
|----------------------------------------------------------------------------------------------------------------------------------------|
| Total# of Session Cache Replica Set Found in mongoConfig.cfg : 4 |
| Total# of Shard Configured : 16 |
| Total# of Active Shards (replica-sets): 12 |
| Total# of HotStandby Shards ( replica-sets): 4 |
| Default shard Configured: 1 |
| Replica-sets not part of shard configuration: set10, ( STATUS : ERR ) |
|----------------------------------------------------------------------------------------------------------------------------------------|
| setname seed1 seed2 port shard# vm1_hostname vm2_hostname status |
|----------------------------------------------------------------------------------------------------------------------------------------|
| set01 sessionmgr01 sessionmgr02 27717 session_cache sessionmgr01-clusterA:27717 sessionmgr02-clusterA:27717 OK |
| set01 sessionmgr01 sessionmgr02 27717 session_cache_2 sessionmgr01-clusterA:27717 sessionmgr02-clusterA:27717 OK |
| set01 sessionmgr01 sessionmgr02 27717 session_cache_3 sessionmgr01-clusterA:27717 sessionmgr02-clusterA:27717 OK |
| set01 sessionmgr01 sessionmgr02 27717 session_cache_4 sessionmgr01-clusterA:27717 sessionmgr02-clusterA:27717 OK |
| set07 sessionmgr01 sessionmgr02 27727 session_cache sessionmgr01-clusterA:27727 sessionmgr02-clusterA:27727 OK |
| set07 sessionmgr01 sessionmgr02 27727 session_cache_2 sessionmgr01-clusterA:27727 sessionmgr02-clusterA:27727 OK |
| set07 sessionmgr01 sessionmgr02 27727 session_cache_3 sessionmgr01-clusterA:27727 sessionmgr02-clusterA:27727 OK |
| set07 sessionmgr01 sessionmgr02 27727 session_cache_4 sessionmgr01-clusterA:27727 sessionmgr02-clusterA:27727 ERR|
| Error : Mis-match found either in hostname or port# for sessionmgr01:27737 and sessionmgr01-clusterA:MultiplePorts - shard#: session_ca|he
| Error : Mis-match found either in hostname or port# for sessionmgr01:27737 and sessionmgr01-clusterA:MultiplePorts - shard#: session_ca|he_2
| Error : Mis-match found either in hostname or port# for sessionmgr01:27737 and sessionmgr01-clusterA:MultiplePorts - shard#: session_ca|he_3
| Error : Mis-match found either in hostname or port# for sessionmgr01:27737 and sessionmgr01-clusterA:MultiplePorts - shard#: session_ca|he_4
| Error : Mis-match found either in hostname or port# for sessionmgr01-site2:27727 and :MultiplePorts - shard#: session_cache_4 |
|----------------------------------------------------------------------------------------------------------------------------------------|
| Mongo:v3.4.16 SESSION SHARD BUCKET INFORMATION |
|----------------------------------------------------------------------------------------------------------------------------------------|
| { "_id" : { "shard" : 1 }, "count" : 1024 } { Status : OK } |
| { "_id" : { "shard" : 2 }, "count" : 1024 } { Status : OK } |
| { "_id" : { "shard" : 3 }, "count" : 1024 } { Status : OK } |
| { "_id" : { "shard" : 4 }, "count" : 1024 } { Status : OK } |
| { "_id" : { "shard" : 5 }, "count" : 1024 } { Status : OK } |
| { "_id" : { "shard" : 6 }, "count" : 1024 } { Status : OK } |
| { "_id" : { "shard" : 7 }, "count" : 1024 } { Status : OK } |
| { "_id" : { "shard" : 8 }, "count" : 1024 } { Status : OK } |
|----------------------------------------------------------------------------------------------------------------------------------------|
| Mongo:v3.4.16 SESSION SHARD INSTANCE VERSION INFORMATION |
|----------------------------------------------------------------------------------------------------------------------------------------|
| { "_id" : "qns02-1", "version" : 24 } { Status : OK } |
| { "_id" : "qns01-1", "version" : 24 } { Status : OK } |
|----------------------------------------------------------------------------------------------------------------------------------------|
Sample output of --get_replica_status
-
If there is no issue in connectivity or network then replica-set status looks like:
|----------------------------------------------------------------------------------|
|----------------------------------------------------------------------------------|
| SESSION:set01 |
| Status via arbitervip:27717 sessionmgr01:27717 sessionmgr02:27717 |
| Member-1 - 27717 : 221.1.1.38 - SECONDARY - sessionmgr02 - ON-LINE - 0 sec - 2 |
| Member-2 - 27717 : 221.1.1.37 - PRIMARY - sessionmgr01 - ON-LINE - ------- - 3 |
| Member-3 - 27717 : 221.1.1.40 - ARBITER - arbitervip - ON-LINE - ------- - 0 |
|----------------------------------------------------------------------------------|
|----------------------------------------------------------------------------------|
Note
|
Two horizontal line separators are added between different replica sets.
|
-
If there is an issue in connectivity or network then replica-set status looks like:
|---------------------------------------------------------------------------------------|
|---------------------------------------------------------------------------------------|
| SESSION:set07 |
| Status via arbitervip:27727 sessionmgr01:27727 |
| Member-1 - 27727 : 221.1.1.37 - SECONDARY - sessionmgr01 - ON-LINE - 0 sec - 2 |
| Member-2 - 27727 : 221.1.1.38 - PRIMARY - sessionmgr02 - ON-LINE - ------- - 3 |
| Member-3 - 27727 : 221.1.1.40 - ARBITER - arbitervip - ON-LINE - ------- - 0 |
| Member-4 - 27727 : 221.1.1.59 - UNKNOWN - sessionmgr03 - OFF-LINE - 0 sec - 1 |
| Member-5 - 27727 : 221.1.1.60 - UNKNOWN - sessionmgr04 - OFF-LINE - 18015 days - 1 |
|---------------------------------------------------------------------------------------|
| Status via sessionmgr02:27727 sessionmgr03:27727 |
| Member-1 - 27727 : 221.1.1.37 - SECONDARY - sessionmgr01 - ON-LINE - 0 sec - 2 |
| Member-2 - 27727 : 221.1.1.38 - PRIMARY - sessionmgr02 - ON-LINE - ------- - 3 |
| Member-3 - 27727 : 221.1.1.40 - ARBITER - arbitervip - ON-LINE - ------- - 0 |
| Member-4 - 27727 : 221.1.1.59 - SECONDARY - sessionmgr03 - ON-LINE - 0 sec - 1 |
| Member-5 - 27727 : 221.1.1.60 - UNKNOWN - sessionmgr04 - OFF-LINE - 18015 days- 1 |
|---------------------------------------------------------------------------------------|
| Status via sessionmgr04:27727 sessionmgr05:27727 |
| Mongodb Daemon or Host is down |
|---------------------------------------------------------------------------------------|
|---------------------------------------------------------------------------------------|
Note
|
One horizontal line separator is added between different members of replica-set.
|
Sample output of --get_frag_status
diagnostics.sh --get_frag_status
CPS Diagnostics HA Multi-Node Environment
---------------------------
|----------------------------------------------------------------------------------------------------------------------------------------|
| Mongo:v3.6.9 DATABASE LEVEL FRAGMENTATION STATUS INFORMATION Date : 2020-05-28 12:17:58 |
| SET TYPE : HA [MEMBER_ROLE : PRIMARY] |
|----------------------------------------------------------------------------------------------------------------------------------------|
| setname dbName storageSize(MB) datasize(MB) indexSize(MB) fileSize(MB) derivedFS(MB) frag% |
|----------------------------------------------------------------------------------------------------------------------------------------|
| ADMIN:set06 |
| Status via sessionmgr01:27721 |
| set06 diameter db not found - - - - - - |
|----------------------------------------------------------------------------------------------------------------------------------------|
| BALANCE:set02 |
| Status via sessionmgr01:27718 |
| set02 balance_mgmt 2.58 1.88 0.01 64.00 0 NoFrag |
|----------------------------------------------------------------------------------------------------------------------------------------|
| SESSION:set01 |
| Status via sessionmgr01:27717 |
| set01 session_cache 0.33 0.05 0.02 16.00 0 NoFrag |
|----------------------------------------------------------------------------------------------------------------------------------------|
| SESSION:set01 |
| Status via sessionmgr01:27717 |
| set01 sk_cache 0.02 0.00 0.01 16.00 0 NoFrag |
|----------------------------------------------------------------------------------------------------------------------------------------|
| SPR:set04 |
| Status via sessionmgr01:27720 |
| set04 spr 0.07 0.01 0.13 64.00 0 NoFrag |
|----------------------------------------------------------------------------------------------------------------------------------------|
|----------------------------------------------------------------------------------------------------------------------------------------|
| SESSION:set07 |
| Status via sessionmgr02:27727 |
| set07 session_cache and sk_cache db not found - - - - - |
|----------------------------------------------------------------------------------------------------------------------------------------|
|----------------------------------------------------------------------------------------------------------------------------------------|
[root@#localhost ~]#