ACI Upgrade/Downgrade Architecture

High Level Summary of APIC Upgrades and Downgrades

When performing an upgrade or downgrade of an APIC cluster, there is a certain sequence of events that occur to allow for the upgrade or downgrade of each APIC separately, along with ensuring that the data on the upgraded or downgraded APIC will be compatible with the target image. Most of these events happen in the background, so it’s important to understand what you should expect to see when you trigger an upgrade or downgrade of the APIC cluster.

  1. Image is uploaded to the firmware repository. The image is synced to all APIC cluster members.

  2. Upgrade or downgrade is triggered to a specific target version.

  3. Each APIC in the cluster goes through the process to install the new image in the first grub partition. This happens in parallel to speed up the upgrade or downgrade process.

  4. Once the image installation is completed, each APIC takes its turn to go through a data conversion process of the database files in a sequential order. When this occurs, the following events happen:

    1. The Data Management Engine (DME) processes shut down. This includes the nginx web server which services all API requests. Because of this, you will lose access to the UI/API, as well as any other backend application that runs on that APIC.

    2. The database files are converted from the initial version to the target version. The amount of time this takes is dependent on the size of the configuration deployed on the ACI fabric. Because of this, the total time to complete the conversion will vary between deployments.

      When your source version is APIC release 6.0(3) or newer, the database conversion process has been enhanced and users may notice a shorter wait time for this process compared to the previous releases.


      Note


      It’s critical that there is no disruptive action taken to the APIC at this stage, as it could result in data loss or partial configuration if this stage does not complete successfully. See Guidelines and Limitations for Upgrading or Downgrading for more information.


    3. The APIC will then reload after the database conversion process has completed successfully and will boot up on the version of software defined in the target version.

  5. After the APIC that performed the reload comes back online, the sequence of events outlined in 4 happen to the next APIC in the cluster. In the meantime, the APIC that came back online initiates the post upgrade activities as the final check of the database. This process repeats itself until all members of the cluster have been upgraded or downgraded.

  6. Prior to Cisco APIC 6.0(6), the upgrade of the APIC cluster was considered complete when all APICs came back online and Fully-Fit regardless of the post upgrade activities. Starting from Cisco APIC 6.0(6), the status of APIC cluster upgrade will transition to “Post Upgrade Pending” until the post upgrade activities are complete on each APIC node. Then, the upgrade status will finally become “Completed”.


    Note


    When you upgrade to Cisco APIC release 6.06 or 6.07 there is an additional increase of around 30 minutes (Typically, the upgrade of an APIC cluster will be in the range of 120 minutes to 150 minutes.)



    Note


    The post upgrade activities on APIC should be successfully completed before proceeding with the switch upgrades. In general, it is recommended to run the pre-upgrade validation script before the APIC cluster upgrade and before the switch upgrades respectively because the upgrade of the APIC cluster and switches may not occur during the same maintenance window. However, prior to Cisco APIC 6.0(6), it is highly encouraged to run the script before the switch upgrades even when it takes place within the same maintenance window because the script checks not only the pre-upgrade validations but also the status of the post upgrade activities to make sure the fabric is ready to proceed with the switch upgrades after the APIC cluster upgrade.



    Note


    Starting with Cisco APIC 6.1(1), due to a performance issue in the latest OpenSSH library (CSCwk67958), the post upgrade activities take longer to complete. For instance, the upgrade of APIC Cluster with 3 APICs from release 5.2 to 6.0 was typically in the range of 90 minutes to 120 minutes, which can be much shorter when the source version is 6.0(3) or newer with the enhanced database conversion process mentioned above.

    However, when the target version is APIC release 6.1(1) or later, there is an additional increase of around 60 minutes. When the source version is 6.0(3) or newer, the increase due to the openSSH library and the reduction from enhanced database conversion may cancel each other out, and you may see shorter or longer upgrade time than your previous APIC cluster upgrade.

    Note that upgrade/downgrade time taken depends on multiple factors - APIC cluster size, configuration, etc. and is an area of constant improvement.


Default Interface Policies in the 5.2(4) release and later

When you upgrade to the 5.2(4) or later release, the Cisco Application Policy Infrastructure Controller (APIC) creates the following default interface policies automatically:

  • CDP (cdpIfPol)

    • system-cdp-disabled

    • system-cdp-enabled

  • LLDP (lldpIfPol)

    • system-lldp-disabled

    • system-lldp-enabled

  • LACP (lacpLagPol)

    • system-static-on

    • system-lacp-passive

    • system-lacp-active

  • Link Level (fabricHIfPol)

    • system-link-level-100M-auto

    • system-link-level-1G-auto

    • system-link-level-10G-auto

    • system-link-level-25G-auto

    • system-link-level-40G-auto

    • system-link-level-100G-auto

    • system-link-level-400G-auto

  • Breakout Port Group Map (infraBrkoutPortGrp)

    • system-breakout-10g-4x

    • system-breakout-25g-4x

    • system-breakout-100g-4x

During the upgrade, if there is already a policy with the exact same name and the exact same parameters as any of these policies, the system takes ownership of those policies and the policies become read-only. If instead the parameters are different, such as the system-cdp-disabled has a setting "enabled," then the policies will continue to be user policies. That is, a user can modify the policies.

High Level Summary of Switch Upgrade and Downgrade

When performing an upgrade or downgrade of an ACI switch node, there is a certain sequence of events that occur to the device(s) being upgraded or downgraded. Most of these events happen in the background, so it’s important to understand what you should expect to see when you trigger an upgrade of an ACI switch node.

  1. The image is pushed from the APIC to the switch.

  2. The filesystem and bootflash of the switch is checked to ensure that there is enough space to extract the image.

  3. The image is extracted, and the primary grub partition is updated to the target version. The older version is moved into the recovery partition.

  4. The BIOS and EPLD images are upgraded if applicable.

  5. The switch will do a clean reload, and will re-join the ACI fabric running the newer version of software.

Starting with release 2.1(4), support was added for the third-party Micron Solid State Drive (SSD) firmware auto update. As part of the standard Cisco APIC software upgrade process, the switches will reboot when they upgrade. During that boot-time process, the system will also check the current SSD firmware and will automatically perform an upgrade to the SSD firmware, if necessary. If the system performs an SSD firmware upgrade, the switches will then go through another clean reboot afterward.

Detailed Summary of Switch Upgrade

The following sections provide a detailed summary of switch upgrades.

Understanding Switch Upgrade and Downgrade Stages

During an ACI switch node upgrade or downgrade, the upgrade or downgrade progress will advance based on the stages which have completed.

The following table provides more details on what happens at each stage of this upgrade or downgrade process:

Upgrade Progress

Install Stage

Description

0%

Firmware upgrade queued

Displayed when firmware is being downloaded to the switch from the APIC.

5%

Firmware upgrade in progress

Displayed when the upgrade installer is initiated, and the upgrade process has started.

45%

Firmware upgrade in progress

Displayed after the bootflash check has completed and the image extraction stage has begun.

60%

Firmware upgrade in progress

Image Extraction stage has completed and the grub partition is being updated with the new software information.

70%

Firmware upgrade in progress

The software has been updated on the switch.

80%

Firmware upgrade in progress

The EPLD and BIOS upgrade has begun.

95%

Firmware upgrade in progress

The EPLD and BIOS upgrade has completed, and switch reboot has been initiated.

100%

Upgraded Successfully

The switch has re-joined the fabric after the clean reload running target version of software.

Guidelines and Limitations for Upgrading or Downgrading

  • If at any point in time you believe the upgrade or downgrade has either stalled or failed, it is critical that you do not take any of the actions listed below:

    • Do not reload any Application Policy Infrastructure Controller (APIC) in the cluster.

    • Do not decommission any Cisco APIC in the cluster.

    • Do not change the firmware target version back to the original version.

    Instead, follow these guidelines:

    1. View the installer log files outlined in the Troubleshooting section if applicable (see APIC Installer Log Files and ACI Switch Installer Log Files). This will help in understanding if there is still activity ongoing on the devices being upgraded or downgraded.

    2. Collect the tech-support files outlined in the Troubleshooting section (see Collecting Tech-Support Files).

    3. Contact Cisco TAC if the upgrade or downgrade does not complete successfully and upload the tech-support files to the TAC case after it has been created.

  • If you are upgrading to Cisco APIC release 4.2(6o), 4.2(7l), 5.2(1g), or later, ensure that any VLAN encapsulation blocks that you are explicitly using for leaf switch front panel VLAN programming are set as "external (on the wire)." If these VLAN encapsulation blocks are instead set to "internal," the upgrade causes the front panel port VLAN to be removed, which can result in a datapath outage.

  • The log record objects are stored only in one shard of a database on one of the Cisco APICs. Because of this, the log records are not accessible while the Cisco APIC is rebooting for an upgrade or downgrade, unlike other objects that can still be read through another Cisco APIC.

  • When Cisco APIC is upgraded from Cisco APIC 4.0 or earlier to 5.1(1) or later, the Service Graph is re-rendered. This will result in traffic disruption until the re-rendering is completed, if vzAny-to-vzAny contract with service graph and another contract with service graph use the same service EPG.

  • Prior to APIC version 5.0, when you have the following configuration, traffic in the provider-to-consumer direction was allowed incorrectly. This was fixed starting from APIC 5.0. As a result, the incorrectly allowed traffic will stop working after the upgrade from a pre-5.0 version to a post-5.0 version. If the provider-to-consumer direction should not be dropped, configure a contract filter in that direction accordingly.

    • Unidirectional contract (only consumer to provider filter without “Apply Both Direction” flag)

    • Shared Service (contract provider and consumer are in different VRFs)

    • L3Out EPG is the provider

    • L3Out EPG has 0.0.0.0/0 with “Shared Security Import Subnet”

    • The provider IP of the traffic is classified to the L3Out subnet 0.0.0.0/0

    • The consumer VRF has Preferred Group (PG) enabled

    • The consumer EPG is included in PG

    • The provider VRF is on the consumer leaf

  • Prior to APIC version 5.0, when you have the following configuration, traffic in the provider-to-consumer direction was allowed incorrectly even though the configuration is invalid. The "Shared Security Import Subnet” scope is a mandatory configuration. This was fixed starting from APIC 5.0. As a result, the incorrectly allowed traffic will stop working after the upgrade from a pre-5.0 version to a post-5.0 version.

    • Shared Service (contract provider and consumer are in different VRFs)

    • L3Out EPG is the provider

    • The "Shared Security Import Subnet” scope is missing on all non-0.0.0.0/0 subnets in the L3Out EPG

    • Those non-0.0.0.0/0 subnets have the “External Subnet for the External EPG” scope

    • The provider IP of the traffic is classified to one of those non-0.0.0.0/0 subnets in the L3Out EPG

  • Global AES Encryption must be enabled to upgrade to APIC release 6.1(2) or newer.

  • To upgrade to the Cisco APIC 6.0(2) release or later, you must perform the following procedure:

    1. Download the Cisco APIC 6.0(2) or later image and upgrade the APIC cluster to the downloaded release. Before this step is completed, do not download the Cisco Application Centric Infrastructure (ACI)-mode switch images to the Cisco APIC. The 6.0(2) release has both 32-bit and 64-bit switch images, but releases prior to 6.0(2) do not support 64-bit images. As a result, downloading the 64-bit images at this time might cause errors or unexpected results. However, if your Cisco APICs have the 5.2(8) release or later, except for the 6.0(1) release, you can download the switch images to the Cisco APIC before this step the same as you would with any other upgrade procedure prior to 6.0(2).

    2. Download both the 32-bit and 64-bit Cisco ACI-mode switch images to the Cisco APIC. Downloading only one of the images may result in errors during the upgrade process.

      Beginning with the 6.0(3) release, the switch determines which image to install from the Cisco APIC based on the available memory of the switch instead of based on a static mapping. If the available memory of the switch is less than or equal to 24 GB, the switch installs the 32-bit image. If the available memory of the switch is greater than or equal to 32 GB, the switch may be upgraded to the 32-bit image first, then upgraded again to the 64-bit image, which results in two reboots during the upgrade process.

      Modular spine switches install the 64-bit image regardless of the switch's available memory.

    3. Create the maintenance groups and trigger the upgrade procedure as usual. Cisco APIC automatically deploys the correct image to the respective switch during the upgrade process.