Disaster Recovery

This chapter describes the disaster recovery process and the health check feature.

Overview

There are two partitions in NCS 1020: The controller SSD (CPU partition) and chassis SSD (Disaster Recovery partition). The Disaster Recovery partition contains all the backup configurations such as ISO images, RPMs, and system configuration files. When the node is corrupted, the Disaster Recovery feature allows the CPU to be replaced with the existing configuration. After replacing the CPU, the node reboots and comes up by restoring the software and configuration files from the chassis SSD without traffic loss.


Note


When Chassis SSD is corrupted and replaced, the new chassis SSD takes backup of the running software and configuration files from the controller SSD without traffic loss.


SSD OIR

The chassis Solid State Drive (SSD) is a semiconductor-based storage device. It is used to store a backup of the software and configuration which will be used by the CPU to re-boot in case of any node failure. Online Insertion and Removal (OIR) is an operation that allows you to replace any faulty module in a system. In NCS 1020, the SSD OIR feature allows the chassis SSD to be removed and replaced anytime.

CPU Controller Replacement Considerations

You must consider the following points for CPU replacement.

Table 1. CPU Controller Replacement

Type of Replacement

Result

When the CPU is removed from the chassis.

NCS 1020 chassis runs in a headless mode which is non-traffic impacting.

When the CPU is replaced with another CPU having the same software and RPMs as in the chassis SSD.

The configuration is restored from the chassis SSD.

When the CPU is replaced with another CPU having different software and RPMs as in the chassis SSD.

The Disaster recovery process starts. In this case, the node boots with the software from the chassis SSD and the configuration is also restored from the chassis SSD.

Chassis SSD Replacement Considerations

The chassis SSD can be removed in NCS 1020.

You must consider the following points for chassis SSD replacement.

Table 2. Chassis SSD Replacement

Type of Replacement

Result

When the chassis SSD is removed while the RP is running.

Disaster Recovery boot is disabled and an alarm is raised.

When the chassis SSD is removed.

The Disaster Recovery Unavailable alarm is raised.

When the chassis SSD is replaced with another chassis SSD having the same software and RPMs as in the CPU SSD.

The configuration is backed up from the CPU SSD.

When the Chassis SSD is replaced with another chassis SSD having different software and RPMs as in the CPU SSD.

Software, RPMs, and the configuration are backed up from the CPU SSD.

When the Chassis SSD is replaced with a spare received from Cisco manufacturing or RMA process.

Software, RPMs, and the configuration are backed up from the CPU SSD.

Health Check of Backup ISO Image

The Health Check feature ensures error-free booting of NCS 1020 chassis during disaster recovery operations. NCS 1020 has a partition for disaster recovery where the backup ISO image is stored. The backup ISO image is stored in the chassis SSD.

The chassis SSD content is audited against the running software by the install process in the background every 12 hours to detect corruption. If the ISO image is corrupted, the software will recover it by copying from the backup location. If the software fails to synchronize with the chassis SSD, then the Disaster Recovery ISO Image Corruption alarm is raised. See the Troubleshooting Guide for Cisco NCS 1020 to clear the alarm.

Automated File Management System

Table 3. Feature History

Feature Name

Release Information

Feature Description

Automated File Management System

Cisco IOS XR Release 24.4.1

The new Automated File Management System is designed for file handling on NCS 1010, NCS 1014, and NCS 1020 nodes. This feature automatically archives older files to a user-specified external location, freeing up valuable SSD space by deleting them from the local nodes. You have the flexibility to define how often deletion tasks can occur, while archiving is effortlessly managed.