The Ultra M Manager Node can be configured to aggregate events received from different Ultra M components as identified in Table 1.
Important:
This functionality is currently supported only with Ultra M deployments based on OSP 10 and that leverage the Hyper-Converged architecture.
Table 1 Component Event Sources
Solution Component
|
Event Source Type
|
Details
|
UCS server hardware
|
CIMC
|
Reports on events collected from UCS C-series hardware via CIMC-based subscription.
These events are monitored in real-time.
|
VIM (Overcloud)
|
OpenStack service health
|
Reports on OpenStack service fault events pertaining to:
-
Failures (stopped, restarted)
-
High availability
-
Ceph / storage
-
Neutron / compute host and network agent
-
Nova scheduler (VIM instances)
Refer to Table 2 for a complete list of services.
Important:
In order to ensure optimal performance, it is strongly recommended that you do not change the default polling-interval.
|
UAS (AutoVNF, UEM, and ESC)
|
UAS cluster/USP management component events
|
Reports on UAS service fault events pertaining to:
Important:
In order to ensure optimal performance, it is strongly recommended that you do not change the default polling-interval.
|
VNF VM Status
|
ESC (VNFM) event notifications
|
Reports on VNF VM deployment state events generated by ESC (the VNFM). The following events are supported:
-
VM_DEPLOYED
-
VM_ALIVE
-
VM_UNDEPLOYED
-
VM_REBOOTED
-
VM_RECOVERY_REBOOT
-
VM_RECOVERY_UNDEPLOYED
-
VM_RECOVERY_DEPLOYED
-
VM_RECOVERY_COMPLETE
-
VM_STOPPED
Important:
This feature is not fully qualified in this release. It is available only for testing purposes. AutoVNF monitors for event notifications from ESC in real time. Though AutoVNF updates the VNFR for the VNF and VNFC the event pertains to upon receipt of an event, it does not generate a corresponding SNMP trap.
|
Table 2 Monitored OpenStack Services
Node Type
|
OpenStack Module
|
OpenStack Services
|
Controller
|
aodh
|
-
openstack-aodh-evaluator.service,
-
openstack-aodh-listener.service,
-
openstack-aodh-notifier.service
|
ceilometer
|
-
openstack-ceilometer-central.service,
-
openstack-ceilometer-collector.service,
-
openstack-ceilometer-notification.service
|
cinder
|
|
glance
|
|
gnocchi
|
|
heat-engine
|
openstack-heat-engine.service
|
heat-api
|
-
openstack-heat-api-cfn.service,
-
openstack-heat-api-cloudwatch.service,
-
openstack-heat-api.service
|
heat
|
-
openstack-heat-api-cfn.service,
-
openstack-heat-api-cloudwatch.service,
-
openstack-heat-api.service
|
nova
|
-
openstack-nova-api.service,
-
openstack-nova-conductor.service,
-
penstack-nova-consoleauth.service,
-
openstack-nova-novncproxy.service,
-
openstack-nova-scheduler.service
|
swift-object
|
-
openstack-swift-object-auditor.service,
-
openstack-swift-object-replicator.service,
-
openstack-swift-object-updater.service,
-
openstack-swift-object.service
|
swift-account
|
-
openstack-swift-account-auditor.service,
-
openstack-swift-account-reaper.service,
-
openstack-swift-account-replicator.service,
-
openstack-swift-account.service
|
swift-container
|
-
openstack-swift-container-auditor.service,
-
openstack-swift-container-replicator.service,
-
openstack-swift-container-updater.service,
-
openstack-swift-container.service
|
swift-proxy
|
openstack-swift-proxy.service
|
swift
|
All above swift services
|
ntpd
|
ntpd.service
|
mongod
|
mongod.service
|
memcached
|
memcached
|
neutron-dhcp-agent
|
neutron-dhcp-agent.service
|
neutron-l3-agent
|
neutron-l3-agent.service
|
neutron-metadata-agent
|
neutron-metadata-agent.service |
neutron-openvswitch-agent
|
neutron-openvswitch-agent.service
|
neutron-server
|
neutron-server.service
|
httpd
|
httpd.service
|
OSD Compute
|
ceph-mon.target
|
ceph-mon.target
|
ceph-radosgw.target
|
ceph-radosgw.target
|
ceph.target
|
ceph.target
|
openvswitch.service
|
openvswitch.service
|
neutron-sriov-nic-agent
|
neutron-sriov-nic-agent.service
|
neutron-openvswitch-agent
|
neutron-openvswitch-agent.service
|
ntpd
|
ntpd.service
|
nova-compute
|
openstack-nova-compute.service
|
libvirtd
|
libvirtd.service
|
Compute
|
ceph-mon.target
|
ceph-mon.target
|
ceph-radosgw.target
|
ceph-radosgw.target
|
ceph.target
|
ceph.target
|
openvswitch.service
|
openvswitch.service
|
neutron-sriov-nic-agent
|
neutron-sriov-nic-agent.service
|
neutron-openvswitch-agent
|
neutron-openvswitch-agent.service
|
ntpd
|
ntpd.service
|
nova-compute
|
openstack-nova-compute.service
|
libvirtd
|
libvirtd.service
|
Events received from the solution components, regardless of the source type, are mapped against the Ultra M SNMP MIB (CISCO-ULTRAM-MIB.my, refer to Ultra M MIB). The event data is parsed and categorized against the following conventions:
-
Fault code: Identifies the area in which the fault occurred for the given component. Refer to the “CFaultCode” convention within the Ultra M MIB for more information.
-
Severity: The severity level associated with the fault. Refer to the “CFaultSeverity” convention within the Ultra M MIB for more information. Since the Ultra M Manager Node aggregates events from different components within the solution, the severities supported within the Ultra M Manager Node MIB map to those for the specific components. Refer to Ultra M Component Event Severity and Fault Code Mappings for details.
-
Domain: The component in which the fault occurred (e.g. UCS hardware, VIM, UEM, etc.). Refer to the “CFaultDomain” convention within the Ultra M MIB for more information.
UAS and OpenStack events are monitored at the configured polling interval as described in Table 3. At the polling interval, the Ultra M Manager Node:
-
Collects data from UAS and OpenStack.
-
Generates/updates .log and .report files and an SNMP-based fault table with this information. It also includes related data about the fault such as the specific source, creation time, and description.
-
Processes any events that occurred:
-
If an error or fault event is identified, then a .error file is created and an SNMP trap is sent.
-
If the event received is a clear condition, then an informational SNMP trap is sent to “clear” an active fault.
-
If no event occurred, then no further action is taken beyond Step 2.
UCS and ESC VM events are monitored and acted upon in real-time. When events occur, the Ultra M Manager generates a .log file and the SNMP fault table. In the case of VM events reported by ESC, upon receipt of an event, AutoVNF updates the VNFR for the VNF and VNFC the event pertains to. In parallel, it passes the event information to the Ultra M Manager functionality within AutoIT. The Ultra M Manager then generates corresponding SNMP traps for each event.
Active faults are reported “only” once and not on every polling interval. As a result, there is only one trap as long as this fault is active. Once the fault is “cleared”, an informational trap is sent.
Important:
UCS events are considered to be the “same” if a previously received fault has the same distinguished name (DN), severity, and lastTransition time. UCS events are considered as “new” only if any of these elements change.
These processes are illustrated in Figure 1. Refer to About Ultra M Manager Log Files for more information.
Figure 2. Ultra M Manager Node Event Aggregation Operation
An example of the snmp_faults_table file is shown below and the entry syntax is described in Figure 2:
"0": [3 "neutonoc-osd-compute-0: neutron-sriov-nic-agent.service" 1 8 "status known"] "1": [3 "neutonoc-osd-compute-0: ntpd" 1 8 "Service is not active state: inactive"] "2": [3 "neutonoc-osd-compute-1: neutron-sriov-nic-agent.service" 1 8 "status known"] "3": [3 "neutonoc-osd-compute-1: ntpd" 1 8 "Service is not active state: inactive"] "4": [3 "neutonoc-osd-compute-2: neutron-sriov-nic-agent.service" 1 8 "status known"] "5": [3 "neutonoc-osd-compute-2: ntpd" 1 8 "Service is not active state: inactive"]
Refer to About Ultra M Manager Log Files for more information.
Figure 3. SNMP Fault Table Entry Description
Each element in the SNMP Fault Table Entry corresponds to an object defined in the Ultra M SNMP MIB as described in Table 3. (Refer also to Ultra M MIB.)
Table 3 SNMP Fault Entry Table Element Descriptions
SNMP Fault Table Entry Element
|
MIB Object
|
Additional Details
|
Entry ID
|
cultramFaultIndex
|
A unique identifier for the entry
|
Fault Domain
|
cultramFaultDomain
|
The component area in which the fault occurred. The following domains are supported in this release:
-
hardware(1) : Harware including UCS servers
-
vim(3) : OpenStack VIM manager
-
uas(4) : Ultra Automation Services Modules
|
Fault Source
|
cultramFaultSource
|
Information identifying the specific component within the Fault Domain that generated the event.
The format of the information is different based on the Fault Domain. Refer to Table 4 for details.
|
Fault Severity
|
cultramFaultSeverity
|
The severity associated with the fault as one of the following:
-
emergency(1) : System level FAULT impacting multiple VNFs/Services
-
critical(2) : Critical Fault specific to VNF/Service
-
major(3) : component level failure within VNF/service.
-
alert(4) : warning condition for a service/VNF, may eventually impact service.
-
informational(5) : informational only, does not impact service
Refer to Ultra M Component Event Severity and Fault Code Mappings for details on how these severities map to events generated by the various Ultra M components.
|
Fault Code
|
cultramFaultCode
|
A unique ID representing the type of fault as. The following codes are supported:
-
other(1) : Other events
-
networkConnectivity(2) : Network Connectivity Failure Events
-
resourceUsage(3) : Resource Usage Exhausted Event
-
resourceThreshold(4) : Resource Threshold crossing alarms
-
hardwareFailure(5) : Hardware Failure Events
-
securityViolation(6) : Security Alerts
-
configuration(7) : Config Error Events
-
serviceFailure(8) : Process/Service failures
Refer to Ultra M Component Event Severity and Fault Code Mappings for details on how these fault codes map to events generated by the various Ultra M components.
|
Fault Description
|
cultramFaultDescription
|
A message containing details about the fault.
|
Table 4 cultramFaultSource Format Values
FaultDomain
|
Format Value of cultramFaultSource
|
Hardware (UCS Servers)
|
Node: <UCS-SERVER-IP-ADDRESS>, affectedDN: <FAULT-OBJECT-DISTINGUSIHED-NAME>
Where:
<UCS-SERVER-IP-ADDRESS> : The management IP address of the UCS server that generated the fault.
<FAULT-OBJECT-DISTINGUSIHED-NAME> : The distinguished name of the affected UCS object.
|
UAS
|
Node: <UAS-MANAGEMENT-IP>
Where:
<UAS-MANAGEMENT-IP> : The management IP address for the UAS instance.
|
VIM (OpenStack)
|
<OS-HOSTNAME>: <SERVICE-NAME>
Where:
<OS-HOSTNAME> : The OpenStack node hostname that generated the fault.
<SERVICE-NAME> : Then name of the OpenStack service that generated the fault.
|
Fault and alarm collection and aggregation functionality within the Hyper-Converged Ultra M solution is configured and enabled through the ultram_cfg.yaml file. (An example of this file is located in Example ultram_cfg.yaml File.) Parameters in this file dictate feature operation and enable SNMP on the UCS servers and event collection from the other Ultra M solution components.
To enable this functionality on the Ultra M solution:
-
Install the Ultra M Manager bundle RPM using the instructions in Install the Ultra M Manager RPM.
Important:
This step is not needed if the Ultra M Manager bundle was previously installed.
-
Become the root user.
sudo -i
-
Navigate to /etc.
cd /etc
-
Create and/or edit the ultram_cfg.yaml file based on your deployment scenario.
Important:
The ultram_cfg.yaml file pertains to both the syslog proxy and event aggregation functionality. Some parts of this file’s configuration overlap and may have been configured in relation to the other function.
-
Navigate to /opt/cisco/usp/ultram-manager.
cd /opt/cisco/usp/ultram-manager
-
Encrypt the clear text passwords in the ultram_cfg.yaml file.
utils.py --secure-cfg /etc/ultram_cfg.yaml
Important:
Executing this scripts encrypts the passwords in the configuration file and appends “encrypted: true” to the end of the file (e.g. ultram_cfg.yamlencrypted: true). Refer to Encrypting Passwords in the ultram_cfg.yaml File for more information.
-
Start the Ultra M Manager Service.
-
Verify the configuration by checking the ultram_health.log file.
cat /var/log/cisco/ultram_health.log