SME Troubleshooting

This chapter describes basic troubleshooting methods used to resolve issues with Cisco Storage Media Encryption.

This chapter includes the following sections:

Troubleshooting Resources

For additional information on troubleshooting, the Cisco MDS 9000 Family NX-OS Troubleshooting Guide provides guidance for troubleshooting issues that may appear when deploying a storage area network (SAN) using the Cisco MDS 9000 Family of switches. The Cisco MDS 9000 NX-OS Family Troubleshooting Guide introduces tools and methodologies that are used to recognize a problem, determine its cause, and find possible solutions.

Cluster Recovery Scenarios

This section includes information on recovery procedures used when one or more switches in a SME cluster are offline or when you want to change the master switch assignment from one switch to another switch. It includes the following procedures:


Note


The procedures in this section describe troubleshooting solutions that use the CLI.

Note


The SME cluster configuration for an offline switch must be done using the CLI. SME cluster configuration for an online switch can be done using DCNM-SAN or the CLI.

Deleting an Offline Switch from a SME Cluster

To delete an offline switch when one or more switches are offline and the master switch is online, use the following procedure.

On the offline switch (for example, switch2), shut down the cluster by performing this task:


    Step 1   switch# configure terminal

    Enters configuration mode.

    Step 2   switch(config)# sme cluster ABC switch(config-sme-cl)#shutdown

    Shuts down the ABC cluster on the offline switch.

    Note    Repeat the procedure for every offline switch.

    On the cluster master switch, delete the offline switch (for example, switch2) by performing this task:

      Step 1   switch# configure terminal

      Enters configuration mode.

      Step 2   switch(config)# sme cluster ABC switch(config-sme-cl)# no node switch2

      Deletes switch2 from the ABC cluster configuration.

      Note    Repeat this step for every offline switch that was shut down in Step 1.

      On the offline switch (switch2), delete the cluster by performing this task:


        Step 1   switch# configure terminal

        Enters configuration mode.

        Step 2   switch(config)# no sme cluster ABC

        Deletes the ABC cluster configuration.

        Note   

        Delete the cluster on every offline switch that was shut down in the first procedure.


        Deleting a SME Cluster with One or More Offline Switches while the Master Switch is Online

        To delete a SME cluster that includes one or more offline switches and online master switch, use these procedures.

        Caution


        Do not remove a cluster master switch from a cluster and then try to revive the cluster on an offline switch. Since the offline switch was not part of the operational cluster, the cluster master may have progressed beyond what is in the offline switch's state. Deleting the cluster master and reviving the cluster on an offline switch can lead to data corruption.


        On the offline switch (switch2), shut down the cluster by performing this task:


          Step 1   switch# configure terminal

          Enters configuration mode.

          Step 2   switch(config)#sme cluster ABC switch(config-sme-c1)#shutdown

          Shuts down the ABC cluster on the offline switch

          Note   

          Repeat the procedure for every offline switch.


          On the cluster master switch, delete the offline switch (switch2) and then delete the cluster by performing this task:


            Step 1   switch#configure terminal

            Enters configuration mode.

            Step 2   switch(config)#sme cluster ABC
            switch(config-sme-cl)#no node switch2

            Deletes switch2 from the ABC cluster configuration.

            Note   

            Repeat this step for every offline switch that was shut down in the first procedure.

            Step 3   switch(config)#no sme cluster ABC

            Deletes the ABC cluster configuration.


            On the offline switch (switch2), delete the cluster by performing this task:

              Step 1   switch# configure terminal

              Enters configuration mode.

              Step 2   switch(config)# no sme cluster ABC

              Deletes the ABC cluster configuration.

              Note   

              Delete the cluster on every offline switch that was shut down in the first procedure.


              Deleting a SME Cluster when All Switches are Offline

              To delete a SME cluster when the master switch and all other switches are offline, use these procedures.


              Note


              When all switches are offline, the cluster is offline.


              On the offline switch (for example, switch2), shut down the cluster by performing this task:


                Step 1   switch# configure terminal

                Enters configuration mode.

                Step 2   switch(config)# sme cluster ABC switch(config-sme-cl)# shutdown

                Shuts down the ABC cluster on the offline switch.

                Note   

                Repeat this procedure for every offline switch.


                On the cluster master switch, shut down the cluster and then delete the cluster by performing this task:


                  Step 1   switch# configure terminal

                  Enters configuration mode.

                  Step 2   switch(config)# sme cluster ABC switch(config-sme-cl)# shutdown

                  Shuts down the ABC cluster.

                  Step 3   switch(config)# no sme cluster ABC

                  Deletes the ABC cluster configuration.


                  On the offline switch (switch2), delete the cluster by performing this task:


                    Step 1   switch# configure terminal

                    Enters configuration mode.

                    Step 2   switch(config)# no sme cluster ABC

                    Deletes the ABC cluster configuration.

                    Note   

                    Delete the cluster on every offline switch that was shut down in the first procedure.


                    Reviving an SME Cluster

                    To revive a cluster on the switch that has the latest SME configuration version, use these procedures.

                    Perform the following steps sequentially to revive a cluster when one or more switches are offline and the cluster is nonoperational (for example, due to a quorum loss). This recovery procedure includes deleting one or more offline switches and then reviving the cluster on the remaining switches.


                    Caution


                    A SME cluster must only be revived on the switch with the latest SME configuration version as displayed by the show sme cluster detail command. Reviving the cluster on a switch that does not have the highest configuration version can lead to data corruption.


                    Shut down the cluster configuration on all the switches by following this task:


                      Step 1   switch# configure terminal

                      Enters configuration mode.

                      Step 2   switch(config)# sme cluster ABC

                      Creates a SME cluster named ABC.

                      Step 3   switch(config-sme-cl)# shutdown

                      Example:
                      This change can be disruptive. Please ensure you have read the "SME Cluster Recovery Procedure" 
                      in the configuration guide. -- Are you sure you want to continue? (y/n) [n] y

                      Shuts down the ABC cluster on the switch.


                      Delete the cluster configuration on the offline switches, that were shut down in the preceding section, by performing this task:


                        Step 1   switch# configure terminal

                        Enters configuration mode.

                        Step 2   switch(config)# nosme cluster ABC

                        Shuts down the ABC cluster on the offline switch.


                        On the cluster master switch, delete all the switches by performing this task:


                          Step 1   switch# configure terminal

                          Enters configuration mode.

                          Step 2   switch(config)# sme cluster ABC

                          Creates an SME cluster named ABC.

                          Step 3   switch(config-sme-cl)# no node switchname

                          Deletes a switch from the configuration.

                          Note   

                          Repeat for every switch that needs to be deleted.


                          Restart the cluster configuration on the remaining switches by performing this task:

                            Step 1   switch# configure terminal

                            Enters configuration mode.

                            Step 2   switch(config)# sme cluster ABC

                            Creates a SME cluster named ABC.

                            Step 3   switch(config-sme-cl)# no shutdown

                            Example:
                            This change can be disruptive. Please ensure you have read the "SME Cluster Recovery Procedure" in the configuration guide. -- Are you sure you want to continue? (y/n) [n] y
                            switch(config-sme-cl)#

                            Starts the ABC cluster on a switch.


                            Troubleshooting General Issues

                            The SME naming convention includes alphanumeric, dash, and underscore characters. Other types of characters will cause problems in the cluster configuration.

                            Troubleshooting Scenarios

                            The following scenarios are described in this section:

                            If DNS is not configured on all switches in a cluster

                            You can use sme.useIP for IP address or name selection when DNS is not configured on all switches in a cluster.

                            sme.useIP can be used in smeserver.properties to enable the use of IP addresses instead of switch names. By default sme.useIP is set to false and DNS names will be used. When DNS is not configured, DCNM-SAN cannot resolve the switch names.

                            When sme.useIP is set to true, DCNM-SAN uses an IP address to communicate with switch in the cluster using SSH. All switches are added to the cluster with an IP address. When you add a local switch, the switch name is used if the name server is configured on the switch, otherwise, the IP address is used.

                            When sme.useIP is false, DCNM-SAN will use the switch name to select interfaces. All the switches added to the clusters will be identified with names. A name server is required for this type of configuration. Otherwise, switches will not be able to communicate with other switches to form the cluster and DCNM-SAN will not be able to resolve the switch name.

                            If you need to replace an MSM-18/4 module with another MSM-18/4 module

                            In the existing MDS 9000 Family platform, a module can be replaced with another module and there is no change in configuration. In SME, due to security reasons, when an MSM-18/4 module is configured as part of a cluster it cannot be replaced with another MSM-18/4 module, otherwise, the SME interface will come up in an inactive state. The correct procedure is to remove the SME interface from the cluster and re-add the interface back into the cluster.

                            If an SME cluster is not successfully created

                            There are three main reasons that a SME cluster may not be successfully created:

                            • SSH must be enabled on every switch that is part of a SME cluster.


                            Note


                            Only SSH/dsa or SSH/rsa are supported for SME cluster configurations using DCNM-SAN Web Client. SSH/rsa1 is not supported for SME cluster config via DCNM-SAN web client in 3.2.2 (release with SME feature). It may (or may not) be supported in future releases.
                            • If the SME switches are managed using their IP addresses (instead of host names or FQDN), the entry “sme.useIP=true” must be set in the smeserver.properties file. Be sure to restart the DCNM-SAN after modifying the smeserver.properties file.

                            • The DNS server must be configured.

                            • Sometimes improperly configured personal firewell software (running on Cisco DCNM-SAN) may also cause a created SME cluster to stay in the “pending” state. Be sure to create proper firewall rules to allow necessary traffic between DCNM-SAN and the DCNM-SAN Web Client and switches.

                            SME Interface creation error

                            If there are any errors while SME interface creation, ensure the following:

                            • Ensure that the service module status is online.
                            • Ensure that the Storage Service Interface (SSI) boot variable is not configured for the service module. If the SSI boot variable is configured for the service module, then the SME interface creation fails.

                            An SME interface does not come up in a cluster

                            If an SME interface does not come up, this can be due to the following:

                            • An SME license is not installed or the license has expired.
                            • An MSM-18/4 module has been replaced after the SME interface has been configured.
                            • The copy running-config startup-config command was not entered after adding or deleting an SME interface from a cluster or before rebooting the switch.

                            For the second and third scenarios, you must first remove and re-add the interface to the cluster and then enter the copy running-config startup-config command.

                            When selecting paths, a “no paths found” message is displayed

                            A tape library controller or robot can be shown as a target in the Select Tape Drives wizard. If you select the controller or robot as a target, a “no paths found” message is displayed. You will need to verify whether or not the selected target is a controller or robot.

                            When the “no paths found” message is displayed, enter the show tech and show tech-support sme command.

                            Newly added tape drives are not showing in a cluster

                            If you add new tape drives as LUNs to a tape library after SME has already discovered available tape drives, a rescan is required from the host to discover the new LUNs.

                            If you need to contact your customer support representative or Cisco TAC

                            At some point, you may need to contact your customer support representative or Cisco TAC for some additional assistance. Before doing so, enter the show tech details and the show tech sme commands and collect all logs from the C:\Program Files\Cisco Systems\MDS 9000\logs directory before contacting your support organization.

                            A syslog message is displayed when a Cisco MDS switch configured with SME in the startup configuration boots up

                            When you reboot a Cisco MDS switch that has the cluster configuration stored in the startup-config file, the following syslog message may be displayed:

                            <timestamp> <switch name> %CLUSTER-2-CLUSTER_DB_SYNC_FAIL: Cluster <cluster-id> application 3 dataset 1 database synchronization failed, reason="Invalid cluster API registration"
                            

                            This error message is expected and can be ignored.

                            Importing a volume group file causes a 'wrap key object not found' error message

                            A tape volume group was created and the volume group was exported to a file. The tape volume group was deleted and a new tape volume group was created. When the same volume group was imported, the import operation fails and the error message “wrap key object not found” is displayed.

                            This error occurs because there is another volume group key active in the Key Management Center with the same index (but different versions) as the current volume group into which the import operation is performed.

                            Accounting log file shows the replication of keys failed

                            The replication of a key for a cluster fails when the transaction context is invalid or is expired. The key entry will be moved to Sme_repl_error_key table. You should manually remove this record from the Sme_repl_error_key table to the Sme_repl_pending_key table and retry the replication process.

                            Issues with smart card(s) or card reader

                            If you have issues with smartcard operations, the following will help ensure success:

                            • After a reboot, use only one instance of a supported browser.
                            • Ensure that there is no smart card in the reader while the applet/wizard starts loads.
                            • When you insert a card and the wizard does not recognize the change, take out the card and reseat it in the reader. Sometimes this triggers correct recognition.
                            • As a last resort, clearing the java classloader cache will help. Open the java console, and press x to clear the classloader cache. Then restart the browser and try again.