Monitoring ESC Health

You can monitor the health of ESC and its services, using one of the following:

  • Monitoring Health API

  • SNMP Trap

Monitoring the Health of ESC Using REST API

ESC provides REST API for any third party software to monitor the health of ESC and its services. Using the API, the third party software can query the health condition of ESC periodically to check whether ESC is in service. In response to the query, API provides status code and messages, see Table 1 for details. In an HA setup the virtual IP (VIP) must be used as the monitoring IP. The return value provides the overall condition of the ESC HA pairs. See the Table 2 for details.

The REST API to monitor the health of ESC is as follows:

GET to https://<esc_vm_ip>:60000/esc/health


Note

The monitoring health API is secured using the existing REST basic HTTP authentication. The user can retrieve the report by using the ESC REST API credentials.


The monitoring health API response with error conditions is as follows:


<?xml version="1.0" encoding="UTF-8" ?>
<esc_health_report>
<status_code>{error status code}</status_code>
<message>{error message}</message>
</esc_health_report>

XML and JSON responses are also supported for the monitoring health API.

If the API response is successful, an additional field called stage is introduced.


<?xml version="1.0" encoding="UTF-8" ?>
<esc_health_report>
<status_code>{success status code}</status_code>
<stage>{Either INIT or READY}</stage>
<message>{success message}</message>
</esc_health_report>

The stage field has INIT or READY parameters.

INIT: The INIT parameter is the initial stage, where ESC accepts pre-provisioning requests such as configuring the config parameters or registering a vim connector.

READY: ESC is ready for any kind of provisioning requests such as deploying, undeploying and so on with this parameter.

The status code and messages below provide the health condition of ESC. The status codes with 2000 series imply that the ESC is operational. The status codes with 5000 series imply that at least one ESC component is not in service.

Note 

The ESC Health API does not check the VIM status because of multi VIM deployment introduced in ESC Release 3.0.

Table 1. ESC Health API Status Code and Messages

Status Code

Message

2000

ESC services are running.

2010

ESC services are running, but ESC High Availability node is not reachable.

2020

ESC services are running. One or more VIM services (for example, keystone and nova) not reachable.

Note 

Not supported from ESC Release 3.0.

2030

ESC services are running, but VIM credentials are not provided.

Note 

Not supported from ESC Release 3.0.

2040

ESC services running. VIM is configured, ESC initializing connection to VIM.

2100

ESC services are running, but ESC High-Availability node is not reachable. One or more VIM services ( for example, nova ) are not reachable.

Note 

Not supported from ESC Release 3.0.

5010

ESC service, ESC_MANAGER is not running.

5020

ESC service, CONFD is not running.

5030

ESC service, MONA is not running.

5040

ESC service, VIM_MANAGER is not running.

5090

More than one ESC service (for example, confd and mona) are not running.


Note

ESC HA mode refers to ESC HA in DRBD setup only. For more information on the ESC HA setup, see the Cisco Elastic Services Controller Install Guide.


The table below describes the status message for standalone ESC and HA with success and failure scenarios. For more information on ESC standalone and HA setup, see the Cisco Elastic Services Controller Install Guide.

Table 2. Health API Status Messages for Standalone ESC and HA

Success

Partial Success

Failure

Standalone ESC

The response is collected from the monitoring health API and the status code is 2000.

NA

  • Monitor cannot get the response from the monitoring health API.

  • The response is collected from the monitoring health API and the status code returned is in the 5000 series.

ESC in HA (Active-Standby)

The response is collected from the monitoring health API and the status code is 2000.

The response is collected from the monitoring health API and the status code is 2010. This indicates that the ESC standby node cannot connect to ESC master node in ESC HA. However, this does not impact the ESC service to northbound.

  • The monitor cannot get the response from the monitoring health API for more than two minutes.

    Note 

    ESC monitoring health API may not be available for a certain period during the HA switchover period. The monitoring software must set a proper threshold to report service failure in this scenario.

  • The response is collected from the monitoring health API and the status code returned is in the 5000 series.

Monitoring the Health of ESC Using SNMP Trap Notifications

You can also configure notifications on the health of various ESC components via SNMP traps using an SNMP Agent. This Agent is installed as part of the standard ESC installation and supports the SNMP version 2c protocol. The SNMP traps currently support only the state of the ESC product and not of the VNFs managed by ESC. This section describes the steps required to configure the ESC SNMP agent and also cover the events that will be triggered as part of the notifications.

Before you begin

  • Ensure the CISCO-ESC-MIB and CISCO-SMI MIB files are available on your system. These are located in the /opt/cisco/esc/snmp/mibs directory. Download these to your SNMP Manager machine and place them in the $HOME/.snmp/mibs directory.

  • Configure SNMP Agent. There are three methods to configure SNMP agent. These methods are discussed in detail in the section below.

Configuring SNMP Agent

In order to receive the SNMP traps, configure the SNMP Agent parameters. The agent can be configured using three different methods described in this section. The best or most applicable method to use depends on your use case.

Procedure

  • Configuring SNMP Agent during the installation of ESC via BootVM: While installing ESC, use the following additional parameters to configure SNMP agent:

    % bootvm.py <esc_vm_name> --image <image-name> --net <net-name> --enable-http-rest --ignore-ssl-errors 
    --managers "udp:ipv4/port" or "udp:[ipv6]/port"
    
  • Configuring via ESCADM: Using the escadm tool, you can modify the SNMP agent configuration parameters such as managers and ignoreSslErrors properties.

    sudo escadm snmp set --ignore_ssl_errors=true --managers="udp:ipv4/port" or "udp:[ipv6]/port"
  • Updating the configuration file: The configuration is in the file /opt/cisco/esc/esc_database/snmp.conf. This file is in JSON format. Following is an example:

    
    
    {"sysDescr": "ESC SNMP Agent",
        "listeningPort": "2001",
        "managers": [
            "udp:[ipv4]/port",
            "udp:[ipv6]/port"
        ],
        "ignoreSslErrors": "yes",
        "logLevel": "INFO",
        "sysLocation": "Unspecified",
        "sysName": "system name",
        "pollSeconds": "15",
        "listeningAddress": "0.0.0.0",
        "healthUrl": "https://<esc_vm_ip>:60000/esc/health",
        "sysContact": "root@localhost"}
    
    
    The table below describes the properties that can be configured.

    Note

    Using bootvm and escadm tool, you can only configure ignoreSslErrors and Managers parameters.


    Table 3. Agent Configuration Parameters

    Variable

    Description

    listeningAddress

    Specify the network interface to listen on. 0.0.0.0 means all interfaces.

    listeningPort

    Specify the UDP port to listen on

    ignoreSslErrors

    Specify as no if you want the certificates to be validated for SSL connections. For untrusted self-signed certificates, set to yes.

    healthUrl

    Specify the URL of the Health Monitor API.
    Note 

    The Agent may be used outside the ESC machine, such as a laptop with a Java-8 installed. In that case, the health URL should point to the ESC machine, and provide an ESC username/password (authUser/authPass) to authenticate with the ESC Monitor.

    authUser

    Specify the HTTP-BASIC username for the Health Monitor API. Use this parameter only if the agent is external to the ESC machine.

    authPass

    Specify the HTTP-BASIC password for the Health Monitor API. Use this parameter only if the agent is external to the ESC machine.

    managers

    Specify an array of strings in "udp:ipv4/port" or "udp:[ipv6]/port" format of where SNMP traps are to be delivered to. If these are changed when the Agent is running, the configuration is reloaded and the new managers used for future traps.

    pollSeconds

    This parameter is used by the scheduler. Use this paramter to specify the number of seconds between polls to the Health Monitor API.

    logLevel

    Specify the levels as INFO, ERROR, WARN, DEBUG, TRACE, ALL and OFF. This parameter can only be changed at runtime.

    sysName

    (SNMPv2-MIB) Optional value indicating the name of this node. The hostname is used by default.

    sysDescr

    (SNMPv2-MIB) Optional value describing this agent.

    sysLocation

    (SNMPv2-MIB) Optional value giving the physical location of this node.

    sysContact

    (SNMPv2-MIB) Optional value giving a contact name and/or email address for enquiries regarding this node

Defining ESC SNMP MIBs

The following table describes the content of ESC MIB. These values are configurable in snmp.conf file.

Variable

Simple IOD

Description

sysName

SNMPv2-MIB::sysName.0

Specify the name of the ESC machine. The host name is taken by default.

sysDescr

SNMPv2-MIB::sysDescr.0

Specify the name of the SNMP Agent.

sysLocation

SNMPv2-MIB::sysLocation.0

Specify where the ESC machine is located.

sysContact

SNMPv2-MIB::sysContact.0

Specify the Admin contact.

Enabling SNMP Trap Notifications

Use the escadm tool to start the SNMP services.
sudo escadm snmp start
You can also use esadm tool to stop, get the status, and modify the configurations of the SNMP agent.


sudo escadm snmp stop
sudo escadm snmp status
sudo escadm snmp restart

Managing SNMP Traps in ESC

This section covers:
  • Understanding the SNMP Notification Types in ESC.

  • Receiving SNMP Trap Message Directly From the Network

  • Managing Trap Endpoints (SNMP Managers)

  • Managing ESC SNMP in an HA Environment

  • Managing Self-Signed Certificates in ESC

Procedure

  • Understanding the SNMP Notification Types in ESC: The following table lists all the events supported by this version of the SNMP agent. These status codes and messages will be returned via a SNMP trap to a registered manager only when there is a change of state of ESC. The status codes with 2000 series imply that the ESC is operational. The status codes with 5000 series imply that at least one ESC component is not in service. For more details on status codes with 2000 series and 5000 series, see section, Monitoring ESC Health Using REST API.

    Status Code

    SNMP Agent-specific Message

    5100

    An HTTP error was received when using the ESC Monitor API

    5101

    The ESC Monitor replied, but the data could not be understood.

    5102

    The Agent could not create a network connection to the ESC Monitor API.

    5199

    An unhandled error occurred (details will be included in the message).

    5200

    In an HA environment when there are changes to the HA master node the agent will send this notification

  • Receiving SNMP trap messages directly from the network: Directly receive SNMP trap messages from the network, by using basic SNMP UNIX tools such as, snmpget snmpwalk and snmptrapd. An example usage:

    snmptrapd -m ALL -f -Lo -c snmptrapd.conf <port>
    This will start an SNMP trap daemon on port 12113. Make sure the Cisco and ESC MIB's are present in ~/.snmp/mibs. The referenced snmptrapd.conf looks like this:
    
    disableAuthorization yes
    authCommunity   log,execute,net public
    # traphandle default /Users/ahanniga/bin/notify.sh esc
     
    createUser myuser MD5 mypassword DES myotherpassword
     
    format2 %V\n% Agent Address: %A \n Agent Hostname: %B \n Enterprise OID: %N \n Trap Sub-Type: %q \n Community/Infosec Context: %P \n Uptime: %T \n PDU Attribute/Value Pair Array:\n%v \n -------------- \n
    
    The trap will contain two entries: statusCode and statusMessage. The trap will be sent when the status changes
    
    DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (3971) 0:00:39.71
    SNMPv2-MIB::snmpTrapOID.0 = OID: CISCO-ESC-MIB::statusNotif
    SNMPv2-MIB::sysDescr.0 = STRING: ESC SNMP Server
    CISCO-ESC-MIB::escStatusCode.0 = STRING: "2000"
    CISCO-ESC-MIB::escStatusMessage.0 = STRING: "ESC services are running."
  • Managing Trap Endpoints (SNMP Managers): The SNMP Agent monitors its configuration file for changes and reloads when a change is made. Add or remove manager endpoints to the configuration file and the new configuration will be used in future traps.

  • Managing ESC SNMP Agent in an HA Environment: Two or more ESC nodes can be deployed in a HA configuration and the SNMP agent does support this configuration. However, consider the following points in an HA deployment:

    • Only one ESC node (the master node) can send SNMP traps

    • The SNMP Agent must be up if the backup node becomes the master.

    • Any changes made to the master configuration must also be applied on backup nodes.

    • If a node becomes the master node due to failover, this will generate a trap.

  • Managing Self-Signed Certificates: When ESC is deployed and the SNMP agent uses ESC Health APIs, it is recommended that a root trusted certificate is installed on the server. If the environment is a known and trusted one then it is possible to ignore these errors using the configuration parameter "ignoreSslErrors". However, if you did want to keep this setting to its more secure default it is possible to install a self-signed certificate by importing the ESC certificate into the JVM trust store. The following section describes the procedure to do so.

    1. Add esc as an alternative name for localhost. In the file "/etc/hosts:" add the following (or ensure that "esc" is added to the end):

      127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 esc
    2. In the SNMP Agent configuration file "/opt/cisco/esc/esc_database/snmp.conf" the healthUrl must point to esc.

      "healthUrl": "https://esc:60000:/esc/health"
    3. Import the certificate into the truststore. Following is an example of importing the certificate, assuming $JAVA_HOME is/usr/lib/jvm/jre-1.8.0-openjdk.x86_64:

      
      cd /opt/cisco/esc/esc-config
      sudo openssl x509 -inform PEM -in server.pem -outform DER -out server.cer
      sudo keytool -importcert -alias esc -keystore $JAVA_HOME/lib/security/cacerts -storepass changeit -file server.cer