Introduction
This document describes how to troubleshoot and fix an NFS "Stale file handle" error when running the command df -h in Cisco DNA Center.
Prerequisites
Requirements
- Linux Filesystem Management knowledge
- NFS v3 or v4 knowledge
- Access to the maglev CLI full bash shell
- NFS IP address or Hostname and NFS Directory Path
Components Used
- Cisco DNA Center 2.3.3 maglev CLI
- NFS v4
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Problem
Cisco DNA Center full backups (assurance) can fail because the NFS is not properly mounted, even if you see it configured successfully in Cisco DNA Center Backup settings. When checking the filesystems in Cisco DNA Center bash with the df -h command you see that you get error lines at the beginning of the command output: df: /data/nfs: Stale file handle
This NFS stale handle file error can be present in any Linux system due to multiple reasons. The most common one is because of any change in the mounted file inode in the disk device. For example, when a service or application opens or creates a file, deletes and closes it, and then attempts to access or delete the same file again so the reference to that file is out-of-date or invalid. In other words, a filehandle becomes stale whenever the file or directory referenced by the handle is removed by another host, while your client still holds an active reference to the object.
Example:
maglev@maglev-master-10-10-10-10:~$ df -h
df: /data/nfs: Stale file handle
Filesystem Size Used Avail Use% Mounted on
udev 189G 0 189G 0% /dev
tmpfs 38G 9.4M 38G 1% /run
/dev/sdb2 47G 28G 18G 62% /
tmpfs 189G 0 189G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 189G 0 189G 0% /sys/fs/cgroup
/dev/sdb4 392G 123G 250G 34% /data
/dev/sdb3 239M 163M 76M 69% /boot/efi
/dev/sdc3 166G 5.6G 152G 4% /var
/dev/sdc1 671G 102G 536G 16% /data/maglev/srv
/dev/sdc2 923G 175G 702G 20% /data/maglev/srv/maglev-system
/dev/sdd1 5.2T 127G 4.9T 3% /data/maglev/srv/ndp
glusterfs-brick-0.glusterfs-brick:/default_vol 923G 187G 699G 22% /mnt/glusterfs/default_vol
glusterfs-brick-0.glusterfs-brick:/ndp_vol 5.2T 181G 4.9T 4% /mnt/glusterfs/ndp_vol
tmpfs 38G 0 38G 0% /run/user/1234
maglev@maglev-master-10-10-10-10:~$
Similar output is provided by the command magctl sts backup mount display.
Example:
maglev@maglev-master-10-10-10-10:~$ magctl sts backup mount display
ERROR: df: /data/nfs: Stale file handle
Note: Multiple stale file handle errors can be found too for the same NFS server with different mounting points. Solution can be applied to each stale file handle error.
Solution
1.- Remove the NFS settings to delete the NFS from the system. Navigate to Cisco DNA Center Menu > Settings > Backup & Restore > Configure > Cisco DNA Center (NFS) and click Remove button.
2.- Validate the NFS stale mounting point in the system by running the command:
$ mount | grep -i <NFS_IP_ADDRESS_OR_FQDN>
Example:
maglev@maglev-master-10-10-10-10:~$ mount | grep -i 192.168.100.1
192.168.100.1:/dna_backups/dna_assurance_data on /data/nfs type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=60,acdirmin=60,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.10.16.2,local_lock=none,addr=10.10.16.3)
Multiple results can be found too for the same NFS server with different mounting points. All of them can be required to be unmounted.
Tip: If the secure shell is enabled in the maglev CLI (magshell), you can run the _shell command to enable full bash. You can require a Token from TAC to grant access to the full maglev bash shell, depending on the Cisco DNA Center version.
3.- Manually unmount the NFS mounting point which is providing the Stale file handle error in the filesystems by running the command:
$ sudo umount <NFS_IP_ADDRESS_OR_FQDN>:/remote/NFS/path /local/mounting/point
Example:
maglev@maglev-master-10-10-10-10:~$ sudo umount 192.168.100.1:/dna_backups/dna_assurance_data /data/nfs
4.- Once you have the NFS unmounted from the filesystem, you can double check by running the command df -h and validate that you do not see the "Stale file handle" error anymore. If you still see an entry for stale file handle, go through the steps 2 and 3 again since the NFS can have different mounting point that were in use too and they require to be unmounted too.
5.- Finally, navigate to Cisco DNA Center Menu > Settings > Backup & Restore > Configure > Cisco DNA Center (NFS) and reconfigure the NFS.
Validation
Validate that the NFS is now mounted correctly with no more "stale file handle" errors by running the command df -h and also by checking the NFS mount point of the backup settings with magctl:
maglev@maglev-master-10-10-10-10:~ $ magctl sts backup mount display
+------------------------------------------+------+------------+------------+------------+
| remote | type | used | available | percentage |
+------------------------------------------+------+------------+------------+------------+
|192.168.100.1:/dna_backups/dna_assurance_data/ | nfs4 | 6369873920 | 3744850944 | 63% |
+------------------------------------------+------+------------+------------+------------+