Introduction
This document explains the high level process of Application Centric Infrastructure (ACI) fault generation and how to prevent a specific fault from being generated. The document demonstrates this with two examples.
How a Fault Gets Generated and How to Selectively Prevent Faults from Being Generated
High Level Mechanism
- Each fault is a Managed Object (MO) of class faultInst (or faultDelegate). This fault MO is generated by another MO, usually its parent, because some rules are violated.
- Each MO in the tree that can generate faults has an attribute monPolDn which points to another MO which is a monitoring policy object. This object allows the property to be modified and the trigger to generate faults. There are multiple classes of the monitoring policy object, such as:
- monInfraPol - deals with infra policy (VMM manager, access port policy, physical ports, and so on) - Located in Fabric > Access Policies > Monitoring policies
- monFabricPol - deals with fabric monitoring - located in Fabric > Fabric Policies > Monitoring policies
- monEPGPol - deals with tenants monitoring > located in Tenant > Monitoring Policy menu
- Usually it will be the default monitoring object. However, by going to the specific area of the object model you can create a specific user defined monitoring policy for any of those monitoring policy classes.
- You can modify many properties of those monitoring policies. The example will show how you can prevent a given fault from being generated for all objects for which the monitoring policy is applied to. However, you can also modify the fault lifecycle timers (retention time, soaking time, and so on).
- In order to modify fault severity or prevent a fault from being generated, you need to select the monitoring object that corresponds to the class of the MO that generated this object (for example, parent of the fault).
- Then under this class, choose the fault code that you want to modify and choose an initial severity of value “squelched”.
This prevents any fault with that code from being generated by the MO that is assigned to this specific monitoring policy.
Example 1 - Fault in a Tenant
Each fault is associated with an object.
admin@apic:~> moquery -d "uni/tn-RD/ipToEpg-Ext_10.200.1.101/rstoEpg-[uni/tn-RD/ap-App_RD1/epg-EPG_RD11]/fault-F0879"
Total Objects shown: 1
# fault.Inst
code : F0879
ack : no
cause : resolution-failed
changeSet :
childAction :
created : 2015-01-22T00:05:00.286+01:00
descr : Failed to form relation to MO uni/tn-RD/ap-App_RD1/epg-EPG_RD11 of class fvAEPg
dn : uni/tn-RD/ipToEpg-Ext_10.200.1.101/rstoEpg-[uni/tn-RD/ap-App_RD1/epg-EPG_RD11]/fault-F0879
domain : infra
highestSeverity : warning
lastTransition : 2015-01-22T00:05:00.286+01:00
lc : raised
modTs : never
occur : 1
origSeverity : warning
prevSeverity : warning
rn : fault-F0879
rule : dbgac-rs-to-epg-resolve-fail
The previous fault is an MO of class fault.Inst and with code F0879.
The fault is associated with an Endpoint Group (EPG) object as shown next.
This object is the distinguished name (DN) of the parent of the fault. This parent object is of class dbg.RsToEpg.
admin@apic:~> moquery -d uni/tn-RD/ipToEpg-Ext_10.200.1.101/rstoEpg-[uni/tn-RD/ap-App_RD1/epg-EPG_RD11]
Total Objects shown: 1
# dbgac.RsToEpg
tDn : uni/tn-RD/ap-App_RD1/epg-EPG_RD11
childAction :
dn : uni/tn-RD/ipToEpg-Ext_10.200.1.101/rstoEpg-[uni/tn-RD/ap-App_RD1/epg-EPG_RD11]
forceResolve : no
lcOwn : local
modTs : 2014-12-05T12:56:29.340+01:00
monPolDn : uni/tn-RD/monepg-RD_Monitoring
rType : mo
rn : rstoEpg-[uni/tn-RD/ap-App_RD1/epg-EPG_RD11]
state : missing-target
stateQual : none
status :
tCl : fvAEPg
tType : mo
uid : 15374
You can see that this EPG object is associated with a monPolDn object. Most objects in the tree are monitored by a monitoring object.
Here is a user defined monitoring object of class monEPGPol with dn.
uni/tn-RD/monepg-RD_Monitoring
Here is the complete object used for monitoring.
admin@apic:~> moquery -d uni/tn-RD/monepg-RD_Monitoring
Total Objects shown: 1
# mon.EPGPol
name : RD_Monitoring
childAction :
descr :
dn : uni/tn-RD/monepg-RD_Monitoring
lcOwn : local
modTs : 2014-11-13T15:41:45.326+01:00
monPolDn : uni/tn-RD/monepg-RD_Monitoring
ownerKey :
ownerTag :
rn : monepg-RD_Monitoring
status :
uid : 10673
The monEPGPol object is configured under the tenant Monitoring Policy, where you can either create a new policy or modify the default one. Here is an example of the monEPGPol name RD_Monitoring.
You can choose the Fault Severity assignment policies and click the pencil (next to the Monitoring object).
Then, if you choose in the monitoring object list of that monitoring policy, the class for which the fault was created (here dbgac.RsToEpg).
You can see all faults associated to that specific class (the only one shown here is F0789).
The fault F0789 is the code of the fault show at the beginning of the example.
You can choose this fault and if you set an initial Severity to squelched (you can leave Target Severity to inherit), it prevents such fault from being generated in the future with the presumption that they are generated by an object that has a link to the monitoring policy you just modified.
However it will not clear existing faults, but only new faults.
Example 2 - Physical Fault
In this example, the fault is generated because port 1/25 on leaf is admin up but with no SFP in it.
admin@apic:~> moquery -c faultInst -f 'fault.Inst.code == "F1678"'
Total Objects shown: 2
# fault.Inst
code : F1678
ack : no
cause : port-failure
changeSet : usage (New: epg)
childAction :
created : 2015-01-19T14:26:13.862+01:00
descr : TEST FAULT -- Port is down, reason:sfpAbsent(connected), used by:EPG,
lastLinkStChg:1970-01-01T01:00:00.000+01:00, operSt:down
dn : topology/pod-1/node-101/sys/phys-[eth1/25]/phys/fault-F1678
domain : access
highestSeverity : critical
lastTransition : 2015-01-19T14:28:41.668+01:00
lc : raised
modTs : never
occur : 1
origSeverity : critical
prevSeverity : critical
rn : fault-F1678
rule : ethpm-if-port-down-infra-epg-test
severity : critical
status :
subject : port-down
type : communications
uid :
This is associated with a physical port. Here is the parent MO that generated that fault.
admin@apic:~> moquery -d topology/pod-1/node-101/sys/phys-[eth1/25]/phys
Total Objects shown: 1
# ethpm.PhysIf
accessVlan : vlan-1
allowedVlans :
backplaneMac : 50:87:89:A2:2A:C1
bundleBupId : 1
bundleIndex : unspecified
cfgAccessVlan : vlan-1
cfgNativeVlan : vlan-1
childAction :
currErrIndex : 4294967295
diags : none
dn : topology/pod-1/node-101/sys/phys-[eth1/25]/phys
encap : 3
errDisTimerRunning : no
errVlanStatusHt : 0
errVlans :
hwBdId : 0
intfT : phy
iod : 29
lastErrors : 0
lastLinkStChg : 1970-01-01T01:00:00.000+01:00
media : 2
modTs : never
monPolDn : uni/infra/moninfra-default
nativeVlan : vlan-1
This is associated with the monInfraPol object configured as shown here.
admin@apic:~> moquery -c monInfraPol
Total Objects shown: 4
# mon.InfraPol
name : default
childAction :
descr :
dn : uni/infra/moninfra-default
lcOwn : local
modTs : 2014-08-06T07:58:19.494+01:00
monPolDn : uni/infra/moninfra-default
ownerKey :
ownerTag :
rn : moninfra-default
status :
uid : 0
Under the Fault Severity Assignment policy, click the pencil in the work pane, next to the monitoring object drop-down list. Add a class where you modify the monitoring properties. Then choose the class of the object that generated the fault, that is, ethmPhysIf.
Choose this class and click the + icon in order to see each fault generated for that object.
In this example, you can see fault F1678 and it its properties can be modified. Choosing Initial severity Squelched and target severity inherit prevents new faults of that code from being generated from the object that has this monitoring policy applied.
After you make the change, if you enable port 1/25 with no SFP in it it will not generate any faults!
Note : In versions earlier than Software Version 2.2: Existing faults (even in Clearing retaining mode) will not be cleared.
Note : In Software Version 2.2 and later: Even existing faults will be affected by the new policy.