General Failure Considerations
Note |
Before proceeding, review the list of operations to avoid in Guidelines and Limitations for Upgrading or Downgrading to ensure stability of the system when troubleshooting an upgrade failure. |
For ACI switch upgrades, there is one scheduler per maintenance policy. By default, when an upgrade or downgrade failure is detected, the scheduler pauses, and no more nodes in that group begin to upgrade. The scheduler expects manual intervention to debug any upgrade failures. After manual intervention is complete, you must resume the paused scheduler.
If you notice that switches are in “queued” state, then check the following:
-
Is the controller cluster healthy? The APIC controller cluster needs to be healthy. If you see “waitingForClusterHealth = yes” in the API or "Waiting for Cluster Convergence" showing "Yes" in the GUI, that means the controller cluster is not healthy. And until it is healthy, switches which have not already started their upgrade will be in the “queued” state.
-
Is the switch maintenance group paused? The group will be paused if any switch fails its upgrade.
-
Navigate to
to check the event logs for each maintenance group. The event logs will provide more detailed information as to why the state of the upgrade is not progressing