Introduction
This document describes common troubleshoot scenarios for Hyperflex cluster deployment through Intersight.
Prerequisites
Requirements
Cisco recommends that you have knowledge of these topics:
- Intersight
- Hyperflex Cluster Deployment
Components Used
This document is not restricted to specific software and hardware versions.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Background Information
The Intersight installer workflow follows the task presented in the chart to deploy a hyperflex cluster:
Task Name
|
Definition
|
PreparePreClusterInfoTask
|
Generates configuration files that contain the information required to deploy the cluster (for example. configuration file path, software YAML file).
|
ServerConfigurationVaildationTask
|
Validates the UCS server configuration to ensure that it has the required hardware and firmware configuration (for example. disk size/PID validation, correct NICs, and so on.)
|
PreConfigurationValidationTask
|
Validates network configuration before the server configuration starts (for example, DNS, NTP, vCenter reachability, duplicate IP check)
|
PrepareLocalImageRepoTask
|
Downloads software images (controller VM OVA, hxdp packages) to the local image store. This task is only included in the workflow if it is run in the connected appliance environment.
|
ServerConfigurationTask
|
Performs required configuration on the UCS servers. For FI-attached deployments, this involves the creation and association of the service profiles.
|
HypervisorEsxConfigurationTask
|
Configures the network portion on the hypervisor. This includes the use of Serial Over Lan to add the uplinks to the management vSwitch and configures the IP address, hostname, and DNS/NTP configuration.
|
PreDeployValidationTask
|
Performs validations before the cluster deployment starts. Validations include network reachability checks and verification that the nodes are not already part of another cluster.
|
PrepareHypervisorDeploymentTask
|
prepares the hypervisor for controller Virtual Machine (VM) configuration. Obtains host information from ESXi and ensures proper OS parameters are set.
|
HypervisorNetworkingTask
|
Configures the network portion on the ESXi host. Includes the configuration of the different vSwitches/Port Groups on the host that are required for the cluster to operate.
|
HypervisorSoftwareUpdateTask
|
Updates software on the hypervisor, which consists of the required VIBs on the host if necessary.
|
HypervisorDatastoreTask
|
Creates the datastore for the controller VM, if required.
|
DeployHyperflexControllerVm
|
Deploys the storage controller OVA on the hypervisor, if required.
|
ConfigVmTask
|
Configures the network portion on the controller VM, which includes the configuration of the required parameters and the data/management networks.
|
DeploySoftwareVmTask
|
Installs HXDP packages on the controller VMs.
|
CollectNodeInfoTask
|
Collects node info such as UUID and IPs.
|
CollectInventoryDataTask
|
Sends inventory data to controller VMs.
|
CreateClusterValidationTask
|
Performs validations to ensure that the controller VM is ready to join the cluster. Includes MTU checks and verifies storage services are ready.
|
CreateClusterTask
|
Creates the storage cluster and takes the controller VM to join all of the nodes together in the cluster.
|
PostInstallHostConfig
|
Configures host after the cluster is deployed. Includes the ESXi password change to the new password supplied in the Hyperflex Cluster Profile.
|
PostInstallStorageControllerVmConfig
|
Configures controller VMs after the cluster is deployed, include the controller VM password change to the new password supplied in the Hyperflex Cluster Profile
|
ClusterAutoClaimTask
|
Claims the HyperFlex cluster to the Intersight user account.
|
Problem
Here are the common errors that you can find in the cluster deployment:
Validation errors/warning:
- Duplicate IPs (error on the same fabric interconnect, warning if overlap globally)
- 2 Node Cluster creation
- 2 for Replication factor chosen (Data Replication Factor of 3 is recommended)
Runtime validation:
- DNS, NTP not reachable
- vCenter is not reachable or incorrect credentials
- Mgmt IP addresses already in use
Deployment errors:
- Same Data VLANs for two different clusters in the same L2 domain (Uplink switch)
- Cross-over link
- ESXi IP configuration failure (due to incorrect ESXi credential)
Solution
Based on the task that fails and the error encountered, you can perform the actions suggested:
DNS/NTP Not Reachable
Validator_NTP_List, Status Code: 9 (FAILED), Message: There are no reachable NTP servers from list
Action: Check the IP, if incorrect modify the policy and restart the workflow.
vCenter Not Reachable
"vCenter reachability and credential check : vCenter server is not reachable or invalid vCenter credentials."
Action: Check the vCenter IP/ credentials, if it is incorrect modify the policy and restart the workflow.
Duplicate IP
“IP address x.x.x.x already in use. Please verify there are no duplicate IPs.”
Action: Check if the IP is in use already, if so modify the policy and restart the workflow.
Connection to Host Failed
failed in Task: 'Connection to host' with Error: 'Host(x.x.x.x) is not reachable via device connector.
Please check the VLAN ID, IP address and gateway settings.'
Action: Check the VLAN/ IP/ Gateway, if it is incorrect modify the policy and restart the workflow.
Auto Claim Failed
"failed to claim the HyperFlex device connector.
This cluster requires continued connectivity to Intersight to ensure Fault Tolerance is maintained.
The cluster cannot tolerate failures until this step is completed. Please check the cluster connectivity to Intersight and perform a manual claim. "
Action: Claim the HyperFlex cluster outlined in the device claim procedure and restart the workflow.
Failed to Configure Server Profile Association
failed in Task: 'Failed to Configure Server Profile Association
Action:
ESXi IP configuration is done through console access via CIMC Serial Over LAN. Sometimes CIMC SoL fails to get the console into the log in prompt. Check the CIMC Console to ESXi via KVM and reset CIMC and restart the workflow.
Failed in Task: Monitor OS Boot
Configuring CIMC server: failed in Task: 'Failed to Configure Server Profile Association.' with Error: 'failed in Task: 'Monitor OS boot' with Error: 'OS Installation has failed'\"}}}}'
Action:
- Ensure that the ESXi root password is correct
- For first-time installation, ensure to check the factory default password.
- For re-install, ensure to uncheck the factory default password.
- Check for SoL access failure.
- Check if the device connector disconnected during OS boot.
If this is a new install ensure that the password provided is not the default password (Cisco123), even after the factory default password radio button is checked.
The result of this action is that the installer is able to log in to the ESXi but is not able to set the password as the strength of the default password is weak
Failed in Task: Verify OVA against Sha1
Deploying Storage Controller VM on ESXi host: Failed in Task: “Verify OVA against Sha1"
Action:
- Check if the ESXi does not have DNS configured.
- Check if the ESXi mgmt IP subnet is blocked for Intersight access.
Failed in Task: Add Host to vCenter Cluster
failed in Task: Add host to vCenter Cluster with Error: Try adding host manually to vCenter and retry. failed to add the host x.x.x.x with 3 attempts
Action:
vCenter version must be higher or equal to all the hosts in the cluster. Upgrade vCenter to higher or equal to ESXi version or downgrade ESXi to a lower stable version.
Failed in Task: Network Storage Controler VM Configuration Failed
failed in Task: 'Check Network for Storage Controller VM Configuration Result' with Error: 'Configure networking failed with error: Error while connecting to ESXi host. Please check the connection and retry'
Action:
EXSI API server ocaccionally does not respond on time.
- Check the hostd service status to ensure is running.
- Reboot ESXi and retry the deployment.
Workflow Failed Due to MTU Issues
failed in Task: 'Verify Storage Cluster' with Error: 'id: 2 entityRef: id: x.x.x.x name: x.x.x cluster message: Could not ping x.x.x.x with MTU 9000 during failover test.
Verify the VLAN and MTU on the upstream switch is correct prior to continuing. severity: warning'
Action:
The Jumbo frame is not enabled in all the paths, when the jumbo frame is enabled, MTU value 9216 must be configured in the uplink switch. Ensure the MTU is set to jumbo frames in all the paths and restart the workflow
Failed in Task: Copying Software Packages to Storage Controller VM
failed in Task: 'Copying Software Packages to Storage Controller VM(outbound)' with Error: 'Unexpected failure during module execution.
Action:
- Ensure there is network connectivity from SCVM to Intersight.
- Verify the required port are allowed in the network.
- Refer to the Pre-installation check links for network requirements.
Related Information