Disaster Recovery

This chapter describes the disaster recovery process and the health check feature.

Overview

There are two partitions in NCS 1014: RP SSD (CPU partition) and chassis SSD (Disaster Recovery partition). The Disaster Recovery partition contains all the backup configurations such as ISO images, RPMs, and system configuration files. When the node is corrupted, the Disaster Recovery feature allows the CPU to be replaced with the existing configuration. After replacing the CPU, the node reboots and comes up by restoring the software and configuration files from the chassis SSD without traffic loss.


Note


When Chassis SSD is corrupted and replaced, chassis SSD takes backup of the running software and configuration files from the RP SSD without traffic loss.


CPU Replacement Considerations

You must consider the following points for CPU replacement.

  • When the CPU is removed from the chassis, NCS 1014 chassis runs in headless mode which is non-traffic impacting.

  • When the CPU is replaced with another CPU having the same software and RPMs as in the chassis SSD, the configuration is restored from the chassis SSD.

  • When the CPU is replaced with another CPU having different software and RPMs as in the chassis SSD, the Disaster recovery process starts. In this case, the node boots with the software from the chassis SSD and the configuration is also restored from the chassis SSD.

How the Node Recovers After a Chassis SSD Replacement

The chassis SSD (NCS1K14-SSD) in NCS 1014 is a hot-swappable module, meaning you can replace it without shutting down the system.


Attention


When you remove the chassis SSD while the controller is in operation, the system raises the Disaster Recovery Unavailable alarm. This alarm clears automatically after you install the new chassis SSD.


These scenarios describe how the node recovers after a chassis SSD replacement.

Table 1. Replacing a Chassis SSD in an Operational Node: How the Node Recovers in Live Network

When you replace the chassis SSD with…

Then the system loads only...

And then the new chassis SSD takes a backup of...

another that has the same software and RPMs as the controller SSD

the configuration from the controller SSD

the system configuration from the controller SSD.

another that has the different software and RPMs from the controller SSD

the software, RPMs, and the configuration from the controller SSD

all the contents from the controller SSD.

another received from Cisco manufacturing as a spare or through an RMA process, which comes with a blank configuration

the software, RPMs, and the configuration from the controller SSD, after inserting this SSD to this system

all controller SSD contents.

Table 2. Replacing a Chassis SSD in a Shutdown Node: How the Node Recovers after Boot Up

When you replace the chassis SSD with…

Then upon powering on...

And then the system...

another that has software and RPMs different from the contents of the controller SSD

the system triggers the disaster recovery workflow

installs the software, RPMs, and configurations from the chassis SSD.

Health Check of Backup ISO Image

The Health Check feature ensures error-free booting of NCS 1014 chassis during disaster recovery operations. NCS 1014 has a partition for disaster recovery where the backup ISO image is stored. The backup ISO image is stored in the chassis SSD.

The chassis SSD content is audited against the running software by the install process in the background every 12 hours to detect corruption. If the ISO image is corrupted, the software will recover it by copying from the backup location. If the software fails to synchronize with the chassis SSD, then the Disaster Recovery ISO Image Corruption alarm is raised. See the Troubleshooting Guide for Cisco NCS 1014 to clear the alarm.