Available Languages

Download Options

PDF (12.9 MB)
View with Adobe Reader on a variety of devices
ePub (8.2 MB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (11.6 MB)
View on Kindle device or Kindle app on multiple devices

Updated:November 11, 2016

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Cisco UCS Integrated Infrastructure with Red Hat OpenStack Platform 8 and Red Hat Ceph Storage

Deployment Guide

Last Updated: November 13, 2016

About Cisco Validated Designs

The CVD program consists of systems and solutions designed, tested, and documented to facilitate faster, more reliable, and more predictable customer deployments. For more information visit

http://www.cisco.com/go/designzone.

ALL DESIGNS, SPECIFICATIONS, STATEMENTS, INFORMATION, AND RECOMMENDATIONS (COLLECTIVELY, "DESIGNS") IN THIS MANUAL ARE PRESENTED "AS IS," WITH ALL FAULTS. CISCO AND ITS SUPPLIERS DISCLAIM ALL WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OR ARISING FROM A COURSE OF DEALING, USAGE, OR TRADE PRACTICE. IN NO EVENT SHALL CISCO OR ITS SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, OR INCIDENTAL DAMAGES, INCLUDING, WITHOUT LIMITATION, LOST PROFITS OR LOSS OR DAMAGE TO DATA ARISING OUT OF THE USE OR INABILITY TO USE THE DESIGNS, EVEN IF CISCO OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

THE DESIGNS ARE SUBJECT TO CHANGE WITHOUT NOTICE. USERS ARE SOLELY RESPONSIBLE FOR THEIR APPLICATION OF THE DESIGNS. THE DESIGNS DO NOT CONSTITUTE THE TECHNICAL OR OTHER PROFESSIONAL ADVICE OF CISCO, ITS SUPPLIERS OR PARTNERS. USERS SHOULD CONSULT THEIR OWN TECHNICAL ADVISORS BEFORE IMPLEMENTING THE DESIGNS. RESULTS MAY VARY DEPENDING ON FACTORS NOT TESTED BY CISCO.

CCDE, CCENT, Cisco Eos, Cisco Lumin, Cisco Nexus, Cisco StadiumVision, Cisco TelePresence, Cisco WebEx, the Cisco logo, DCE, and Welcome to the Human Network are trademarks; Changing the Way We Work, Live, Play, and Learn and Cisco Store are service marks; and Access Registrar, Aironet, AsyncOS, Bringing the Meeting To You, Catalyst, CCDA, CCDP, CCIE, CCIP, CCNA, CCNP, CCSP, CCVP, Cisco, the Cisco Certified Internetwork Expert logo, Cisco IOS, Cisco Press, Cisco Systems, Cisco Systems Capital, the Cisco Systems logo, Cisco Unity, Collaboration Without Limitation, EtherFast, EtherSwitch, Event Center, Fast Step, Follow Me Browsing, FormShare, GigaDrive, HomeLink, Internet Quotient, IOS, iPhone, iQuick Study, IronPort, the IronPort logo, LightStream, Linksys, MediaTone, MeetingPlace, MeetingPlace Chime Sound, MGX, Networkers, Networking Academy, Network Registrar, PCNow, PIX, PowerPanels, ProConnect, ScriptShare, SenderBase, SMARTnet, Spectrum Expert, StackWise, The Fastest Way to Increase Your Internet Quotient, TransPath, WebEx, and the WebEx logo are registered trademarks of Cisco Systems, Inc. and/or its affiliates in the United States and certain other countries.

All other trademarks mentioned in this document or website are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (0809R)

Table of Contents

About Cisco Validated Designs. 2

Executive Summary. 9

Solution Overview.. 10

Introduction. 10

Audience. 10

Purpose of the Document. 10

Solution Summary. 11

Deployment Architecture. 12

Solution Design. 13

Physical Topology. 13

Solution Overview.. 15

System Hardware and Software Specifications. 16

Bill of Materials. 17

Solution Components. 20

Cisco UCS Blades Distribution in Chassis. 20

Service Profiles. 20

Cisco UCS vNIC Configuration. 20

Red Hat OpenStack Platform 8 director. 23

Reference Architecture Workflow.. 24

Control 25

Compute. 25

Ceph-Storage. 25

Network Isolation. 25

Storage Management. 26

Network Types by Server Role. 26

Tenant Network Types. 27

Cluster Manager and Proxy Server. 27

High Availability. 28

Cluster Manager and Proxy Server. 28

Cisco ML2 Plugins. 29

Instance creation work flow.. 30

Deployment Hardware. 32

Cabling Details. 32

Physical Cabling. 34

Cabling Logic. 36

Cisco UCS Configuration. 37

Configure Cisco UCS Fabric Interconnects. 37

Configure the Cisco UCS Global Policies. 39

Configure Sub-Orgs. 40

Configure Server Ports for Blade Discovery and Rack Discovery. 41

Configure Network Uplinks. 43

Create KVM IP Pools. 44

Create MAC Pools. 44

Create UUID Pools. 45

Create VLANs. 46

Create a Network Control Policy. 50

Create vNIC Templates. 50

Create Boot Policy. 60

Create a Maintenance Policy. 64

Create an IPMI Access Policy. 66

Create a Power Policy. 69

Create a QOS system class. 71

Create Storage Profiles for the Controller and Compute Blades. 71

Create Storage Profiles for Cisco UCS C240 M4 Blade Server. 79

Create Service Profile Templates for Controller Nodes. 83

Create Service Profile Templates for Compute Nodes. 103

Create Service Profile Templates for Ceph Storage Nodes. 118

Create Service Profile for Undercloud ( OSP8 director ) Node. 122

Create Service Profiles for Controller Nodes. 138

Create Service Profiles for Compute Nodes. 139

Create Service Profiles for Ceph Storage Nodes. 140

Create LUNs for the Ceph OSD and Journal Disks. 141

Create Port Channels for Cisco UCS Fabrics. 150

Cisco Nexus Configuration. 153

Configure the Cisco Nexus 9372 PX Switch A.. 153

Configure the Cisco Nexus 9372 PX Switch B.. 154

Check Nexus OS compatibility. 154

Enable Features on the Switch. 155

Enable Jumbo MTU.. 156

Create VLANs. 156

Configure the Interface VLAN (SVI) on the Cisco Nexus 9K Switch A.. 157

Configure the Interface VLAN (SVI) on the Cisco Nexus 9K Switch B.. 158

Configure the VPC and Port Channels on Switch A.. 160

Configure the VPC and Port Channels on the Cisco Nexus 9K Switch B.. 161

Verify the Port Channel Status on the Cisco Nexus Switches. 162

Cisco UCS Validation Checks. 164

Install the Operating System on the Undercloud Node. 166

Undercloud Setup. 178

Undercloud Installation. 178

Post Undercloud Installation Checks. 180

Introspection. 182

Pre-Installation Checks for Introspection. 182

Run Introspection. 190

Set Flavors. 191

Overcloud Setup. 194

Customize Heat Templates. 194

Single NIC VLAN Templates. 195

Bond with VLAN Templates. 196

Cisco UCS Configuration. 196

Yaml Configuration Files Overview.. 198

Pre-Installation Checks Prior to Deploying Overcloud. 203

Deploying Overcloud. 203

Debugging Overcloud Failures. 204

Overcloud Post Deployment Steps. 204

Overcloud Post-Deployment Configuration. 206

Health Checks. 211

Functional Validation. 212

Upscaling the POD.. 213

Scale Up Storage Nodes. 213

Provision the New Server in Cisco UCS.. 213

Run Introspection. 218

Run Overcloud Deployment. 219

Post Deployment Health Checks. 220

Scale Up Compute Nodes. 221

Provision the New Blade Server in Cisco UCS.. 221

Run Introspection. 223

Run Overcloud Deploy. 225

Post Deployment and Health Checks. 225

High Availability. 228

High Availability of Software Stack. 229

OpenStack Services. 229

High Availability of Hardware Stack. 232

HA of Fabric Interconnects. 232

Hardware Failures of IO Modules. 235

HA on Cisco Nexus Switches. 236

Creating Virtual Machines. 238

HA on Controller Blades. 240

HA on Compute Blades. 245

HA on Storage Nodes. 254

HA on Undercloud Node. 266

Hardware Failures of Blade Servers. 267

Types of Failures. 267

OpenStack Dependency on Hardware. 267

IPMI Address. 267

NICs and MAC addresses. 268

Local Disk. 268

Cisco UCS Failure Scenarios. 268

Hard Disk Failure. 268

Blade Server Replacement. 268

Case Study. 269

Insert the New Blade into the Chassis. 272

Fault Injection. 272

Health Checks. 272

Remove Failed Blade from Inventory. 273

Change IPMI Address. 273

Insert Old Disks. 274

Associate Service Profile. 274

Reboot the Server. 275

Post Replacement Steps. 275

Health Checks Post Replacement. 275

Frequently Asked Questions. 276

Cisco Unified Computing System... 276

OpenStack. 276

Troubleshooting. 278

Cisco Unified Computing System... 278

Undercloud Install 278

Introspection. 278

Running Introspection on Failed Nodes. 279

Overcloud Install 279

Debug Network Issues. 280

Debug Ceph Storage Issues. 280

Debug Heat Stack Issues. 280

Overcloud Post-Deployment Issues. 281

Nexus Plugin Checks. 281

Cisco UCS Manager Plugin Checks. 282

Run Time Issues. 282

Best Practices. 284

Reference Documents. 285

Conclusion. 286

Appendix. 287

Undercloud instackenv.json. 287

Overcloud Templates. 290

network-environment.yaml 290

storage-environment.yaml 290

ceph-storage.yaml 299

ceph.yaml (C240M4L). 301

ceph.yaml (C240M4S). 302

cisco-plugins.yaml 303

wipe-disk.yaml (C240M4L). 304

wipe-disk.yaml (C240M4S). 305

run.sh. 306

create_network_router.sh. 306

create_vm.sh. 307

About the Authors. 310

Acknowledgements. 310

Executive Summary

The Cisco Validated Design program consists of systems and solutions that are designed, tested, and documented to facilitate and improve customer deployments. These designs incorporate a wide range of technologies and products into a portfolio of solutions that have been developed to address the business needs of our customers.

The reference architecture described in this document is a realistic use case for deploying Red Hat OpenStack Platform 8 on Cisco UCS Blade and Rack-Mount servers. This document provides step-by-step instructions for setting up Cisco UCS hardware, installing Red Hat OpenStack Platform director, issues and workarounds evolved during installation, integration of Cisco Plugins with OpenStack, requirements for leveraging High Availability from both hardware and software, and lessons learned while validating the solution and including a few troubleshooting steps.

Cisco UCS Integrated Infrastructure for Red Hat OpenStack Platform is an all-in-one solution for deploying OpenStack based private cloud using Cisco Infrastructure and Red Hat OpenStack Platform. This solution is validated and supported by Cisco and Red Hat, for rapid infrastructure deployment and reduce the risk of scaling from proof-of-concept to enterprise production environment completely.

Solution Overview

Introduction

Automation, virtualization, cost, and ease of deployment are the key criteria to meet the growing IT challenges. Virtualization is a key and critical strategic deployment model for reducing the Total Cost of Ownership (TCO) and achieving better utilization of the platform components like hardware, software, network and storage. The platform should be flexible, reliable and cost effective for enterprise applications.

The Cisco UCS solution implementing Red Hat OpenStack Platform provides a very simplistic yet fully integrated and validated infrastructure to deploy virtual machines (VMs) in various sizes to suit your application needs. Cisco Unified Computing System (UCS) is a next-generation data center platform that unifies computing, network, storage access, and virtualization into a single interconnected system, which makes Cisco UCS an ideal platform for OpenStack architecture. The combined architecture of Cisco UCS platform, Red Hat OpenStack Platform and Red Hat Ceph Storage can accelerate your IT transformation by enabling faster deployments, greater flexibility of choice, efficiency, and lower risks. Furthermore, Cisco Nexus series of switches provide the network foundation for the next-generation data center.

This deployment guide provides you with step-by-step instructions to install Red Hat OpenStack Platform director and Red Hat Ceph Storage on Cisco UCS Blades and Rack-Mount servers. The traditional complexities of installing OpenStack are simplified by Red Hat Openstack Platform director while Cisco UCS Manager capabilities bring an integrated, scalable, multi-chassis platform in which all resources participate in a unified management domain. The solution included in this CVD is an effort by Cisco Systems, Inc. in partnership with Red Hat, Inc., and Intel Corporation.

Audience

The audience for this document includes, but is not limited to, sales engineers, field consultants, professional services, IT managers, partner engineers, IT architects, and customers who want to take advantage of an infrastructure that is built to deliver IT efficiency and enable IT innovation. The reader of this document is expected to have the necessary training and background to install and configure Red Hat Enterprise Linux, Cisco Unified Computing System (UCS) and Cisco Nexus Switches as well as a high level understanding of OpenStack components. External references are provided where applicable and it is recommended that the reader be familiar with these documents.

Readers are also expected to be familiar with the infrastructure, network and security policies of the customer installation.

Purpose of the Document

This document details the installation steps for Red Hat OpenStack Platform 8 and Red Hat Ceph Storage 1.3 architecture on the Cisco UCS platform. It also describes the daily operational challenges in running OpenStack and steps to mitigate them, High Availability use cases, Live Migration, common troubleshooting aspects of OpenStack along with Operational best practices.

Solution Summary

This solution is focused on Red Hat OpenStack Platform 8 (based on the upstream OpenStack Liberty release) and Red Hat Ceph Storage 1.3 on Cisco Unified Computing System. The advantages of Cisco UCS and Red Hat OpenStack Platform combine to deliver an OpenStack Infrastructure as a Service (IaaS) deployment that is quick and easy to setup. The solution can scale up for greater performance and capacity or scale out for environments that require consistent, multiple deployments. Converged infrastructure of Compute, Networking, and Storage components from Cisco UCS is a validated enterprise-class IT platform, rapid deployment for business critical applications, reduces costs, minimizes risks, and increase flexibility and business agility Scales up for future growth.

Red Hat OpenStack Platform 8 on Cisco UCS helps IT organizations accelerate cloud deployments while retaining control and choice over their environments with open and interoperable cloud solutions. It also offers redundant architecture on compute, network, and storage perspective. The solution comprises the following key components:

· Cisco Unified Computing System (UCS)

— Cisco UCS 6200 Series Fabric Interconnects

— Cisco VIC 1340

— Cisco VIC 1227

— Cisco 2204XP IO Module or Cisco UCS Fabric Extenders

— Cisco B200 M4 Servers

— Cisco C240 M4 Servers

· Cisco Nexus 9300 Series Switches

· Cisco Nexus Plugin for Nexus Switches

· Cisco UCS Manager Plugin for Cisco UCS

· Red Hat Enterprise Linux 7.2

· Red Hat OpenStack Platform director

· Red Hat OpenStack Platform 8

· Red Hat Ceph Storage 1.3

The scope is limited to the infrastructure pieces of the solution. It does not address the vast area of the OpenStack components and multiple configuration choices available in OpenStack.

Deployment Architecture

This architecture is based on Red Hat OpenStack Platform built on Cisco UCS hardware is an integrated foundation to create, deploy, and scale OpenStack cloud based on Liberty OpenStack community release. This deployment is further to Cisco’s validated design on Red Hat OpenStack Platform 7 released earlier. The earlier deployment guide can be referenced at: http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/UCS_CVDs/ucs_openstack_osp7.html

The reference architecture use case provides a comprehensive, end-to-end example of deploying RHOSP 8 cloud on bare metal using Red Hat OpenStack Platform director and services through heat templates.

The first section in this CVD covers setting up of Cisco hardware - the blade and rack servers, chassis and Fabric Interconnects and the peripherals like Cisco Nexus 9000 switches. The second section explains how to install cloud through Red Hat OpenStack Platform director. The final section includes the functional and High Availability tests on the configuration, and the best practices evolved while validating the solution.

Figure 1 OpenStack Deployment Architecture

Solution Design

Physical Topology

Figure 2 illustrates the physical topology of this solution.

Figure 2 Physical Topology

The configuration is comprised of three controller nodes, six compute nodes, three storage nodes, and a pair of Cisco UCS Fabric Interconnect and Cisco Nexus Switches, where most of the tests were conducted. This architecture is scalable horizontally and vertically within the chassis.

· More Compute Nodes and Chassis can be added as desired.

· More Ceph Nodes for storage can be added. The Ceph nodes can be Cisco UCS C240M4L or C240M4S.

· If there is a higher bandwidth requirement then Cisco UCS Fabric Extender 2208XP can be used instead of 2204XP model, which is used in this configuration.

· Both Cisco UCS Fabric Interconnects and Cisco Nexus Switches can be 96 port switches instead of 48 ports as shown in Figure 2.

Solution Overview

The Cisco Unified Computing System is an integrated, scalable, multi-chassis platform in which all resources participate in a unified management domain. The Cisco Unified Computing System accelerates the delivery of new services simply, reliably, and securely through end-to-end provisioning and migration support for both virtualized and non-virtualized systems. Cisco UCS Manager using single connect technology manages servers and chassis and performs auto-discovery to detect inventory, manage, and provision system components that are added or changed.

The Red Hat OpenStack Platform IaaS cloud on Cisco UCS servers is implemented as a collection of interacting services that control compute, storage, and networking resources.

OpenStack Networking handles creation and management of a virtual networking infrastructure in the OpenStack cloud. Infrastructure elements include networks, subnets, and routers. Because OpenStack Networking is software-defined, it can react in real-time to changing network needs, such as creation and assignment of new IP addresses.

Compute serves as the core of the OpenStack cloud by providing virtual machines on demand. Compute servers support the libvirt driver that uses KVM as the hypervisor. The hypervisor creates virtual machines and enables live migration from node to node. OpenStack also provides storage services to meet the storage requirements for the above mentioned virtual machines.

The solution also includes OpenStack Networking ML2 Core components.

The Cisco Nexus driver for OpenStack Neutron allows customers to easily build their infrastructure-as-a-service (IaaS) networks using the industry’s leading networking platform, delivering performance, scalability, and stability with the familiar manageability and control you expect from Cisco® technology.

Cisco UCS Manager Plugin configures compute blades with necessary VLANs. The Cisco UCS Manager Plugin talks to the Cisco UCS Manager application running on the Fabric Interconnect.

System Hardware and Software Specifications

Table 1 and Table 2 list the Hardware and Software releases used for solution verification.

Table 1 Required Hardware Components

	Hardware	Quantity	Firmware Details
OSP director	Cisco UCS B200M4 blade	1	2.2(5)
Controller	Cisco UCS B200M4 blade	3	2.2(5)
Compute	Cisco UCS B200M4 blade	6	2.2(5)
Storage	Cisco UCS C240M4L or C240M4S Rack server	3	2.2(5)
Fabrics Interconnects	Cisco UCS 6248UP FIs	2	2.2(5)
Nexus Switches	Cisco Nexus 9372 NX-OS	2	7.0(3)I1(3)

Table 2 Software Specifications

	Software	Version
Operating System	Red Hat Enterprise Linux	7.2
OpenStack Platform	Red Hat OpenStack Platform	RHOSP 8
OpenStack Platform	Red Hat OpenStack Platform director	RHOSP 8
	Red Hat Ceph Storage	1.3
Plugins	Cisco Nexus Plugin	RHOSP 8
	Cisco UCS Manager Plugin	RHOSP 8

Bill of Materials

This section contains the Bill of Materials used in the configuration.

Table 3 Bill of Materials

Component	Model	Quantity	Comments
OpenStack Platform director Node	Cisco UCS B200M4 blade	1	CPU – 2 x E5-2630 V3 Memory – 8 x 16GB 2133 MHz DIMM – total of 128G Local Disks – 2 x 300 GB SAS disks for Boot Network Card – 1x1340 VIC Raid Controller – Cisco MRAID 12 G SAS Controller
Controller Nodes	Cisco UCS B200M4 blades	3	CPU – 2 x E5-2630 V3 Memory – 8 x 16GB 2133 MHz DIMM – total of 128G Local Disks – 2 x 300 GB SAS disks for Boot Network Card – 1x1340 VIC Raid Controller – Cisco MRAID 12 G SAS Controller
Compute Nodes	Cisco UCS B200M4 blades	6	CPU – 2 x E5-2660 – V3 Memory – 16 x 16GB 2133 MHz DIMM – total of 256G Local Disks – 2 x 300 GB SAS disks for Boot Network Card – 1x1340 VIC Raid Controller – Cisco MRAID 12 G SAS Controller
Storage Nodes (only one of LFF/SFF)	Cisco UCS C240M4L Rack Servers	3	CPU – 2 x E5-2630 – V3 Memory – 8 x 16GB 2133 MHz DIMM – total of 128G Internal HDD – None Ceph OSD’s – 8 x 6TB SAS Disks Ceph Journals – 2 x 400GB SSD’s OS Boot – 2 x 1TB SAS Disks Network Cards – 1 x VIC 1227 Raid Controller – Cisco MRAID 12 G SAS Controller
Storage Nodes (only one of LFF/SFF)	Cisco UCS C240M4S Rack Servers	3	CPU – 2 x E5-2630 – V3 Memory – 8 x 16GB 2133 MHz DIMM – total of 128G Internal HDD – None Ceph OSD’s – 18 x 1.2 TB SAS Disks Ceph Journals – 4 x 400GB SSD’s OS Boot – 2 x 1TB SAS Disks Network Cards – 1 x VIC 1227 Raid Controller – Cisco MRAID 12 G SAS Controller
Chassis	Cisco UCS 5108 Chassis	2
IO Modules	Cisco UCS 2204XP Fabric Extenders	4
Fabric Interconnects	Cisco UCS 6248UP Fabric Interconnects	2
Switches	Cisco Nexus 9372PX Switches	2

Solution Components

This section provides an overview of the components used in this solution.

Cisco UCS Blades Distribution in Chassis

Figure 3 lists the server distribution in the Cisco UCS Chassis.

Figure 3 Servers Distribution in Cisco UCS Chassis

The controller and compute nodes are distributed across the chassis. This gives High Availability to the stack though a failure of Chassis per se does not happen. There is only one Installer node in the system and can be added in any one of the Chassis as above. In case of larger deployments having 3 or more chassis, it is recommended to distribute one controller in each chassis.

In larger deployments where the chassis are fully loaded with blades, a better approach would be to distribute the tenant and storage traffic across the Fabric Interconnects. This method ensures that the tenant traffic is distributed evenly across both the fabrics.

Service Profiles

Service profiles will be created from the Service Profile Templates. However once successfully created, they will be unbound from the templates.

Cisco UCS vNIC Configuration

Figure 4 illustrates the network layout.

Figure 4 Network Layout

A Floating or Provider network is not necessary. VMs can be accessed either through floating point network or through external network. This is determined how the external bridge is configured, covered later in this document.

The family of vNICs are placed in the same Fabric Interconnect to avoid an extra hop to the upstream Nexus switches.

The following categories of vNICs are used in the setup:

· Provisioning Interfaces pxe vNICs are pinned to Fabric A

· Tenant vNICs are pinned to Fabric A

· Internal API vNICs are pinned to Fabric B

· External Interfaces vNICs are pinned to Fabric A

· Storage Public Interfaces are pinned to Fabric A

· Storage Management Interfaces are pinned to Fabric B

· Management vNICs are pinned to Fabric A

While configuring vNICs in templates and with failover option enabled in Fabrics, the vNICs order has to be specified manually as shown below.

Figure 5 vNIC Placement

The order of vNICs has to be pinned as above for consistent PCI device naming options. The above is an example of controller blade. The same has to be done for all the other servers, the compute and storage nodes. This order should match the Overcloud heat templates NIC1, NIC2, NIC3, NIC4 etc.

Red Hat OpenStack Platform 8 director

Red Hat OpenStack Platform 8 (RHOSP 8) delivers an integrated foundation to create, deploy, and scale a more secure and reliable public or private OpenStack cloud. RHOSP 8 starts with the proven foundation of Red Hat Enterprise Linux and integrates Red Hat’s OpenStack Platform technology to provide production ready cloud platform. RHOSP 8 director is based on community based Liberty OpenStack release. Red Hat OpenStack Platform 8 introduces a cloud installation and lifecycle management tool chain. It provides the following:

· Simplified deployment through ready-state provisioning of bare metal resources

· Flexible network definitions

· High Availability with Red Hat Enterprise Linux Server High Availability

· Integrated setup and Installation of Red Hat Ceph Storage 1.3

Reference Architecture Workflow

Figure 6 illustrates the reference architecture workflow.

Figure 6 Reference Architecture Workflow

Red Hat OpenStack Platform director is a new set of tool chain introduced with Kilo that automates the creation of Undercloud and Overcloud nodes as above. It performs the following:

· Install Operating System on Undercloud Node

· Install Undercloud Node

· Perform Hardware Introspection

· Prepare Heat templates and Install Overcloud

Undercloud Node is the deployment environment while Overcloud nodes are referred to nodes actually rendering the cloud services to the tenants.

The Undercloud is the TripleO (OOO – OpenStack on OpenStack) control plane. It uses native OpenStack APIs and services to deploy, configure, and manage the production OpenStack deployment. The Undercloud defines the Overcloud with Heat templates and then deploys it through the Ironic bare metal provisioning service. Red Hat OpenStack Platform director includes predefined Heat templates for the basic server roles that comprise the Overcloud. Customizable templates allow director to deploy, redeploy, and scale complex Overclouds in a repeatable fashion.

Ironic gathers information about bare metal servers through a discovery mechanism known as introspection. Ironic pairs servers with bootable images and installs them through PXE and remote power management.

Red Hat Openstack Platform director deploys all servers with the same generic image by injecting Puppet modules into the image to tailor it for specific server roles. It then applies host-specific customizations through Puppet including network and storage configurations. While the Undercloud is primarily used to deploy OpenStack, the Overcloud is a functional cloud available to run virtual machines and workloads.

The following subsections detail the roles that comprise the Overcloud.

Control

This role provides endpoints for REST-based API queries to the majority of the OpenStack services. These include Compute, Image, Identity, Block, Network, and Data processing. The controller nodes also provide the supporting facilities for the APIs, database, load balancing, messaging, and distributed memory objects. They also provide external access to virtual machines. The controller can run as a standalone server or as a High Availability (HA) cluster. The current configuration was configured with HA.

Compute

This role provides the processing, memory, storage, and networking resources to run virtual machine instances. It runs the KVM hypervisor by default. New instances are spawned across compute nodes in a round-robin fashion based on resource availability by default. The default filters can be altered if needed; for more information, see OpenStack documentation.

Ceph-Storage

Ceph is a distributed block, object store and file system. This role deploys Object Storage Daemon (OSD) nodes for Ceph clusters. It also installs the Ceph Monitor service on the controller. The instance distribution is influenced by the currently set filters.

Network Isolation

OpenStack requires multiple network functions. While it is possible to collapse all network functions onto a single network interface, isolating communication streams in their own physical or virtual networks provides better performance and scalability. Each OpenStack service is bound to an IP on a particular network. In a cluster a service virtual IP is shared by the HA controllers.

Provisioning

The Control plane installs Overcloud through this network. All nodes must have a physical interface attached to the provisioning network. This network carries DHCP/PXE and TFTP traffic. It must be provided on a dedicated interface or native VLAN to the boot interface. The provisioning interface can also act as a default gateway for the Overcloud; the compute and storage nodes use this provisioning gateway interface on the Undercloud node.

External

The External network is used for the Horizon dashboard and the Public APIs, as well as hosting the floating IPs that are assigned to VMs. The Neutron L3 routers which perform NAT are attached to this interface. The range of IPs that are assigned to floating IPs should not include the IPs used for hosts and VIPs on this network.

Internal API

This network is used for connections to the API servers, as well as RPC messages using RabbitMQ and connections to the database. The Glance Registry API uses this network, as does the Cinder API. This network is typically only reachable from inside the OpenStack Overcloud environment, so API calls from outside the cloud will use the Public APIs via the external Network.

Management

Red Hat OpenStack Platform 8 introduces a new network called as Management Network that provides access for system administration functions such as SSH access, DNS, NTP traffic etc. In the current validated design, this network was used to communicate with Cisco Nexus switches and UCS Manager.

Tenant

Virtual machines communicate over the tenant network. It supports three modes of operation: VXLAN, GRE, and VLAN. VXLAN and GRE tenant traffic is delivered through software tunnels on a single VLAN. Individual VLANs correspond to tenant networks in cases where the VLAN tenant networks are used.

Storage

This network carries storage communication including Ceph, Cinder, and Swift traffic. The virtual machine instances communicate with the storage servers through this network. Data-intensive OpenStack deployments should isolate storage traffic on a dedicated high bandwidth interface, that is, 10 GB interface. The Glance API, Swift proxy, and Ceph Public interface services are all delivered through this network.

Storage Management

Storage management communication can generate large amounts of network traffic. This network is shared between the front and back end storage nodes. Storage controllers use this network to access data storage nodes. This network is also used for storage clustering and replication traffic.

Network traffic types are assigned to network interfaces through Heat template customizations prior to deploying the Overcloud. Red Hat OpenStack Platform director supports several network interface types including physical interfaces, bonded interfaces and either tagged or native 802.1Q VLANs.

Network Types by Server Role

Server role was discussed in the previous section. Each server role requires access to specific types of network traffic. The network isolation feature allows Red Hat OpenStack Platform director to segment network traffic by particular network types. When using network isolation, each server role must have access to its required network traffic types.

By default, Red Hat OpenStack Platform director collapses all network traffic to the provisioning interface. This configuration is suitable for evaluation, proof of concept, and development environments. It is not recommended for production environments where scaling and performance are a primary concern.

Tenant Network Types

Red Hat OpenStack Platform 8 supports tenant network communication through the OpenStack Networking (Neutron) service. OpenStack Networking supports overlapping IP address ranges across tenants through the Linux kernel’s network namespace capability. It also supports three default networking types:

VLAN segmentation mode

Each tenant is assigned a network subnet mapped to an 802.1q VLAN on the physical network. This tenant networking type requires VLAN-assignment to the appropriate switch ports on the physical network.

VXLAN segmentation mode

In the VXLAN mechanism driver encapsulates each layer 2 Ethernet frame sent by the VMs in a layer 3 UDP packet. The UDP packet includes an 8-byte field, within which a 24-bit value is used for the VXLAN Segment ID. The VXLAN Segment ID is used to designate the individual VXLAN overlay network on which the communicating VMs are housed. This provides segmentation for each Tenant network.

GRE segmentation mode

In the GRE mechanism driver encapsulates each layer 2 Ethernet frame sent by the VMs in a special IP packet using the GRE protocol (IP type 47). The GRE header contains a 32-bit key which is used to identify a flow or virtual network in a tunnel. This provides segmentation for each Tenant network.

Cisco Nexus Plugin is bundled in OpenStack Platform 8, Liberty release. While it can support both VLAN and VXLAN configurations, only VLAN mode is validated as part of this design. VXLAN will be considered in future releases when the current Cisco VIC 1340 adapter will be certified on VXLAN and Red Hat operating system.

Cluster Manager and Proxy Server

Two components drive HA for all core and non-core OpenStack services:

· Cluster Manager

· Proxy Server

The cluster manager is responsible for the startup and recovery of an inter-related services across a set of physical machines. It tracks the cluster’s internal state across multiple machines. State changes trigger appropriate responses from the cluster manager to ensure service availability and data integrity.

This section describes the steps to configure network for Overcloud. The network setup used in the configuration is shown in Figure 4.

The configuration is done using Heat Templates on the Undercloud prior to deploying the Overcloud. These steps need to be followed after the Undercloud installation. In order to use network isolation, we have to define the Overcloud networks. Each will have an IP subnet, a range of IP addresses to use on the subnet and a VLAN ID. These parameters will be defined in the network environment file. In addition to the global settings there is a template for each of the nodes like controller, compute and Ceph that determines the NIC configuration for each role. These have to be customized to match the actual hardware configuration.

Heat communicates with Neutron API running on the Undercloud node to create isolated networks and to assign neutron ports on these networks. Neutron will assign a static IP to each port and Heat will use these static IPs to configure networking on the Overcloud nodes. A utility called os-net-config runs on each node at the time of provisioning to configure host level networking.

Table 4 lists the VLANs that are created on the configuration.

Table 4 VLANs

VLAN Name	VLAN Purpose	VLAN ID or VLAN Range Used in This Design for Reference
Management	Management Network to UCSM and Nexus Switches	10
PXE	Provisioning Network VLAN	110
Internal-API	Internal API Network	100
External	External Network	215
Storage Public	Storage Public Network	120
Storage Management	Storage Cluster or Management Network	150
Floating	Floating Network	160

High Availability

Red Hat OpenStack Platform director’s approach is to leverage Red Hat’s distributed cluster system.

Cluster Manager and Proxy Server

The Cluster Manager is responsible for the startup and recovery of an inter-related services across a set of physical machines. It tracks the cluster’s internal state across multiple machines. State changes trigger appropriate responses from the cluster manager to ensure service availability and data integrity.

In the HA model Clients do not directly connect to service endpoints. Connection requests are routed to service endpoints by a proxy server.

The Cluster Manager provides state awareness of other machines to coordinate service startup and recovery, shared quorum to determine majority set of surviving cluster nodes after failure, data integrity through fencing and automated recovery of failed instances.

The Proxy servers help with load balancing connections across service end points. The nodes can be added or removed without interrupting service.

Red Hat OpenStack Platform director uses HAproxy and Pacemaker to manage HA services and load balance connection requests. With the exception of RabbitMQ and Galera, HAproxy distributes connection requests to active nodes in a round-robin fashion. Galera and RabbitMQ use persistent options to ensure requests go only to active and/or synchronized nodes. Pacemaker checks service health at every one second interval. Timeout settings vary by service.

The combination of Pacemaker and HAProxy:

· Detects and recovers machine and application failures

· Starts and stops OpenStack services in the correct order

· Responds to cluster failures with appropriate actions including resource failover and machine restart and fencing

RabbitMQ, memcached, and mongodb do not use HAProxy server. These services have their own failover and HA mechanisms.

Cisco ML2 Plugins

OpenStack Modular Layer 2 (ML2) allows the separation of network segment types and the device specific implementation of segment types. ML2 architecture consists of multiple ‘type drivers’ and ‘mechanism drivers’. Type drivers manage the common aspects of a specific type of network while the mechanism driver manages specific device to implement network types.

Type drivers:

· VLAN

· GRE

· VXLAN

Mechanism drivers:

· Cisco UCS Manager

· Cisco Nexus

· Openvswitch, Linuxbridge

The Cisco Nexus driver for OpenStack Neutron allows customers to easily build their Infrastructure-as-a-Service (IaaS) networks using the industry’s leading networking platform, delivering performance, scalability, and stability with the familiar manageability and control you expect from Cisco® technology. ML2 Nexus drivers dynamically provision OpenStack managed VLANs on Cisco Nexus switches. They configure the trunk ports with dynamically created VLANs solving the logical port count issue on the Nexus switches. They provide better manageability of the network infrastructure.

ML2 Cisco UCS Manager drivers dynamically provision OpenStack managed VLANs on Fabric Interconnects. They configure VLANs on Controller and Compute node VNICs. The Cisco UCS Manager Plugin talks to the Cisco UCS Manager application running on Fabric Interconnect and is part of an ecosystem for Cisco UCS Servers that consists of Fabric Interconnects and IO modules. The ML2 Cisco UCS Manager driver does not support configuration of Cisco UCS Servers, whose service profiles are attached to Service Templates. This is to prevent that same VLAN configuration to be pushed to all the service profiles based on that template. The plugin can be used after the Service Profile has been unbound from the template.

Instance creation work flow

To create a virtual machine, complete the following steps:

1. Dashboard/CLI authenticates with Keystone.

2. Dashboard/CLI sends nova-boot to nova-api.

3. nova-api validates the token with keystone.

4. nova-api checks for conflicts, if not creates a new entry in database.

5. nova-api sends rpc.call to nova-scheduler and gets updated host-entry with host-id.

6. nova-scheduler picks up the request from the queue.

7. nova-scheduler sends the rpc.cast request to nova-compute for launching an instance on the appropriate host after applying filters.

8. nova-compute picks up the request from the queue.

9. nova-compute sends the rpc.call request to nova-conductor to fetch the instance information such as host ID and flavor (RAM, CPU, and Disk).

10. nova-conductor picks up the request from the queue.

11. nova-conductor interacts with nova-database and picks up instance information from queue.

12. nova-compute performs the REST with auth-token to glance-api. Then, nova-compute retrieves the Image URI from the Image Service, and loads the image from the image storage.

13. glance-api validates the auth-token with keystone and nova-compute gets the image data.

14. nova-compute performs the REST call to network API to allocated and configure the network

15. neutron server validates the token and creates network info.

16. Nova-compute performs REST to volume API to attach volume to the instance.

17. Cinder-api validates the token and provides block storage info to nova-compute.

18. Nova compute generates data for the hypervisor driver.

19. DHCP and/or Router port bindings by neutron on controller nodes triggers Cisco ML2 plugins:

a. Cisco UCS Manager driver creates VLAN and trunks the eth1 vNICs for the controller node’s service-profile

b. Nexus driver creates VLAN and trunks the switch port(s) mapped to the controller node

20. Virtual Machine Instance’s Port bindings to a Compute Node triggers again ML2:

a. Cisco UCS Manager driver creates VLAN and trunks the eth1 vNICs for the compute node's service-profile

b. Nexus driver creates VLAN and trunks the switches’ port(s) mapped to the compute node

Figure 7 Instance Creation Workflow

Deployment Hardware

This section details the deployment hardware used in this solution.

Cabling Details

Table 5 lists the cabling information.

Table 5 Cabling Details

Physical Cabling

Figure 8 illustrates the physical cabling used in this solution.

Figure 8 Physical Cabling

Please note the port numbers on VIC1227 card. As shown in the figure Port 1 is on the right and Port 2 is on the left.

http://www.cisco.com/c/dam/en/us/products/interfaces-modules/ucs-virtual-interface-card-1227/kO71144-large.jpg

Description: http://www.cisco.com/c/dam/en/us/products/interfaces-modules/ucs-virtual-interface-card-1227/kO71144-large.jpg

Cabling Logic

Figure 9 illustrates the cabling logic used in this solution.

Figure 9 Cabling Logic

Cisco UCS Configuration

Configure Cisco UCS Fabric Interconnects

Configure the Fabric Interconnects after the cabling is complete. To hook up the console port on the Fabrics, complete the following steps:

Please replace the appropriate addresses for your setup.

Cisco UCS 6248UP Switch A

Connect the console port to the UCS 6248 Fabric Interconnect switch designated for Fabric A:

Enter the configuration method: console

Enter the setup mode; setup newly or restore from backup.(setup/restore)? setup

You have chosen to setup a new fabric interconnect? Continue? (y/n): y

Enforce strong passwords? (y/n) [y]: y

Enter the password for "admin": <password>

Enter the same password for "admin": <password>

Is this fabric interconnect part of a cluster (select 'no' for standalone)?

(yes/no) [n]:y

Which switch fabric (A|B): A

Enter the system name: UCS-6248-FAB

Physical switch Mgmt0 IPv4 address: 10.23.10.6

Physical switch Mgmt0 IPv4 netmask: 255.255.255.0

IPv4 address of the default gateway: 10.23.10.1

Cluster IPv4 address: 10.23.10.5

Configure DNS Server IPv4 address? (yes/no) [no]: y

DNS IPv4 address: <<var_nameserver_ip>>

Configure the default domain name? y

Default domain name: <<var_dns_domain_name>>

Join centralized management environment (UCS Central)? (yes/no) [n]: Press Enter

You will be prompted to review the settings.

If they are correct, answer yes to apply and save the configuration. Wait for the login prompt to make sure that the configuration has been saved.

Cisco UCS 6248UP Switch B

Connect the console port to Peer UCS 6248 Fabric Interconnect switch designated for Fabric B:

Enter the configuration method: console

Installer has detected the presence of a peer Fabric interconnect. This Fabric

interconnect will be added to the cluster. Do you want to continue {y|n}? y

Enter the admin password for the peer fabric interconnect: <password>

Physical switch Mgmt0 IPv4 address: 10.23.10.7

Apply and save the configuration (select “no” if you want to re-enter)? (yes/no): yes

Verify the connectivity:

After completing the FI configuration, verify the connectivity as detailed below by logging to one of the Fabrics or the VIP address and checking the cluster state or extended state as shown below:

Configure the Cisco UCS Global Policies

To configure the Global policies, log into UCS Manager GUI, and complete the following steps:

1. Under Equipment à Global Policies;

a. Set the Chassis/FEX Discovery Policy to match the number of uplink ports that are cabled between the chassis or fabric extenders and to the fabric interconnects.

b. Set the Power policy based on the input power supply to the UCS chassis. In general, UCS chassis with 5 or more blades recommends minimum of 3 power supplies with N+1 configuration. With 4 power supplies, 2 on each PDUs the recommended power policy is Grid.

c. Set the Global Power allocation Policy as Policy driven Chassis Group cap.

d. Click Save changes to save the configuration.

Configure Sub-Orgs

In case you wish to have sub-orgs in UCS, create sub-orgs as below:

Under Servers Tab -> Service Profiles -> root -> Sub-Organizations -> Create Organization

Enter the Organization Name of your choice as below and click OK to continue.

Configure Server Ports for Blade Discovery and Rack Discovery

Navigate to each fabric interconnect and configure the server ports on the fabric interconnects. Complete the following steps:

Under Equipment à Fabric Interconnects à Fabric Interconnect A à Fixed Module à Ethernet Ports:

Select the ports (Port 1 to 8) that are connected to the left side of each UCS chassis FEX 2204, right-click them and select Configure as Server Port.

Select the ports (Port 9 to 11) that are connected to the 10G MLOM (VIC1227) port1 of each UCS C240 M4, right-click them, and select Configure as Server Port.

Click Save Changes to save the configuration.

Repeat steps 1 and 2 on Fabric Interconnect B and save the configuration.

The blades and rack servers are discovered as shown below:

Navigate to each blade and rack servers to make sure that the disks are in Unconfigured Good state; if not, convert JBOD to Unconfigured as shown in the screen shot below. The screen shot illustrates how to convert a disk to the Unconfigured Good state.

Configure Network Uplinks

Navigate to each Fabric Interconnects and configure the Network Uplink ports on Fabric Interconnects. Complete the following steps:

Under Equipment à Fabric Interconnects à Fabric Interconnect A à Fixed Module à Ethernet Ports:

Select the port 17 and Port18 that are connected to Nexus 9k switches, right-click them and select Configure as Uplink Port.

Click Save Changes to save the configuration.

Repeat the steps 1 and 2 on Fabric Interconnect B.

Create KVM IP Pools

To access the KVM console of each UCS Server, create the KVM IP pools from the UCS Manager GUI, and complete the following steps:

Under LAN à Pools à root à IP Pools à IP Pool ext-mgmt à right-click and select Create Block of IPV4 addresses.

Specify the Starting IP address, subnet mask and gateway and size.

Create MAC Pools

To configure a MAC address for each Cisco UCS Server VNIC interface, create the MAC pools from the Cisco UCS Manager GUI, and complete the following steps:

Under LAN à Pools à root à Sub-Organizations -> osp8 -> MAC Pools à right-click and select Create MAC Pool.

Specify the name and description for the MAC pool.

Create UUID Pools

To configure the UUID pools for each UCS Server, create the UUID pools from the Cisco UCS Manager GUI, complete the following steps:

Under Servers à Pools à root à Sub-Organizations -> osp8 -> UUID Suffix Pools à right-click and select Create UUID Suffix Pool.

Specify the name and description for the UUID pool.

Click Add.

Specify the UUID Suffixes and size for the UUID pool.

Click Finish to complete the UUID pool creation.

Create VLANs

To create VLANs for all OpenStack networks for Controller, Compute and Ceph Storage Servers, from the UCS Manager GUI, complete the following steps:

Under LAN à Cloud à VLANs à right-click and select Create VLANs.

Specify the VLAN name as PXE-Network for Provisioning and specify the VLAN ID as 110 and click OK.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\VLAN-creation2.JPG$

Specify the VLAN name as Storage-Public for accessing Ceph Storage Public Network and specify the VLAN ID as 120 and click OK.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\VLAN-creation4.JPG$

Specify the VLAN name as Storage-Mgmt-Network for Managing Ceph Storage Cluster and specify the VLAN ID as 150 and click OK.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\VLAN-creation6.JPG$

Specify the VLAN name as External-Network and specify the VLAN ID as 215 and click OK.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\VLAN-creation8.JPG$

Specify the VLAN name as Tenant-Floating-Network for accessing Tenant instances externally and specify the VLAN ID as 160 and click OK.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\VLAN-creation7.JPG$

Specify the VLAN name as Management for accessing UCSM and Nexus 9k Networks and specify the VLAN ID as 10 and click OK.

The screenshot below shows the output of VLANs for all the OpenStack Networks created above:

Create a Network Control Policy

To configure the Network Control policy from the UCS Manager, complete the following steps:

Under LAN à Policies à root à Sub-Organizations -> Network Control Policies à right-click and select Create Network Control Policy.

Specify the name and choose CDP as Enabled. Select the MAC register mode as "All hosts VLANs" and Action on Uplink fail as "Link Down" and click OK.

Create vNIC Templates

To Configure VNIC templates for each UCS Server VNIC interfaces, create VNIC templates from the Cisco UCS Manager GUI, complete the following steps:

The storage networks are configured with 9000 MTU.

Under LAN à Policies à root à Sub-Organizations ->osp8 -> VNIC Templates à right-click and select Create VNIC Template.

Create VNIC template for PXE or Provisioning network. Specify the name, description, Fabric ID, VLAN ID and choose MAC pools from the drop-down list.

Create VNIC template for Internal-API network. Specify the name, description, Fabric ID, VLAN ID and choose MAC pools from the drop-down list.

Create VNIC template for Tenant-Internal Network. You need not associate any VLAN’s for Tenant-Internal network. The vlans for a tenant will be created globally and also on each compute blade’s Tenant interface by UCSM plugin.

Create VNIC template for External Network.

Create the VNIC template for Storage Public Network.

Create VNIC template for Storage Mgmt Cluster network.

Create VNIC template for Tenant Floating Network.

Create VNIC template for Management Network.

After completion, you can see the VNIC templates for each traffic.

For storage interfaces, a MTU value of 9000 has been added.

Create Boot Policy

To configure the Boot policy for the Cisco UCS Servers, create a Boot Policy from the Cisco UCS Manager GUI and complete the following steps:

Under Server à Policies à root à Sub-Organizations -> Boot Policies à right-click and select Create Boot Policy.

Specify the name and description. Select the First boot order as LAN boot and specify the actual VNIC name of the PXE network (PXE-NIC). Then select the second boot order and click Add Local LUN.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Boot-Policy2.JPG$

Specify the VNIC Name as PXE-NIC.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Boot-Policy3.JPG$

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Boot-Policy4.JPG$

Make sure the First boot order is PXE NIC and second boot order is Local LUN and click OK.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Boot-Policy5.JPG$

Create a Maintenance Policy

A maintenance policy determines a pre-defined action to take when there is a disruptive change made to the service profile associated with a server. When creating a maintenance policy you have to select a reboot policy which defines when the server can reboot once the changes are applied.

To configure the Maintenance policy from the Cisco UCS Manager, complete the following steps:

Under Server à Policies à root à Sub-Organizations -> Maintenance Policies à right-click and select Create Maintenance Policy.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\MaintenanacePolicy2.JPG$

Create an IPMI Access Policy

This policy allows you to determine whether IPMI commands can be sent directly to the server, using the IP address (KVM IP address).

To configure the IPMI Access profiles from the Cisco UCS Manager, complete the following steps:

Under Server à Policies à root à Sub-Organizations -> IPMI Access profiles à right-click and select Create IPMI Access Profile.

Specify the name and click IPMI over LAN as Enabled and click “+”.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\IPMI-Access-Policy2.JPG$

Specify the username and password. Choose Admin for the Role and click OK.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\IPMI-Access-Policy3.JPG$

Click OK to create the IPMI access profile.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\IPMI-Access-Policy4.JPG$

Create a Power Policy

Cisco UCS uses the priority set in the power control policy, along with the blade type and configuration, to calculate the initial power allocation for each blade within a chassis. During normal operation, the active blades within a chassis can borrow power from idle blades within the same chassis. If all blades are active and reach the power cap, service profiles with higher priority power control policies take precedence over service profiles with lower priority power control policies.

To configure the Power Control policy from the UCS Manager, complete the following steps:

Under Server à Policies à root à Sub-Organizations -> Power Control Policies à right-click and select Create Power Control Policy.

Specify the name and description. Choose Power Capping as No Cap.

No Cap keeps the server runs at full capacity regardless of the power requirements of the other servers in its power group. Setting the priority to no-cap prevents Cisco UCS from leveraging unused power from that particular blade server. The server is allocated the maximum amount of power that that blade can reach.

Create a QOS system class

Create a QOS system class as shown below:

Select the Best Effort class as MTU 9000, which will be leveraged in vNIC templates for storage public and storage management vNICs.

Create Storage Profiles for the Controller and Compute Blades

To allow flexibility in defining the number of storage disks, roles and usage of these disks, and other storage parameters, you can create and use storage profiles. LUNs configured in a storage profile can be used as boot LUNs or data LUNs, and can be dedicated to a specific server. You can also specify a local LUN as a boot device. However, LUN resizing is not supported.

To configure Storage profiles from the Cisco UCS Manager, complete the following steps:

Under Storage à Storage Provisioning à Storage Profiles à root à Sub-Organizations -> right-click and select Create Storage Profile.

Specify the name and click “+”.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Storage-Profile-Blade-Creation2.JPG$

Specify the Local LUN name and size as 250 in GB and click Auto Deploy.

To configure RAID levels and configure the number of disks for the disk group, select Create Disk Group Policy.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Storage-Profile-Blade-Creation3.JPG$

Specify the name and choose RAID level as RAID 1 Mirrored. RAID1 is recommended for the Local boot LUNs.

Select Disk group Configuration (Manual) and click “+”. Keep the Virtual Drive configuration with the default values.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Storage-Profile-Blade-Creation4.JPG$

Specify Disk Slot Number as 1 and Role as Normal.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Storage-Profile-Blade-Creation5.JPG$

Create another Local Disk configuration with the Slot number as 2 and click OK.

In this solution, we used Local Disk 1 and Disk 2 as the boot LUNs with RAID 1 mirror configuration.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Storage-Profile-Blade-Creation6.JPG$

Choose the Disk group policy Boot Disk-OS for the Local Boot LUN.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Storage-Profile-Blade-Creation7.JPG$

Click OK to confirm the Storage profile creation.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Storage-Profile-Blade-Creation8.JPG$

Create Storage Profiles for Cisco UCS C240 M4 Blade Server

To configure the Storage profiles from the Cisco UCS Manager, complete the following steps:

Specify the Storage profile name as C240-Ceph for the Ceph Storage Servers. Click “+”.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Storage-Profile-Rack-Creation1.JPG$

Specify the LUN name and size in GB. For the Disk group policy creation, select Disk Group Configuration for Ceph nodes as Ceph-OS-Boot similar to “BootDisk-OS” disk group policy as above.

After successful creation of Disk Group Policy, choose Disk Group Configuration as Ceph-OS-Boot and click OK.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Storage-Profile-Rack-Creation2.JPG$

Click OK to complete the Storage Profile creation for the Ceph Nodes.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Storage-Profile-Rack-Creation3.JPG$

For the Cisco UCS C240 M4 servers, the LUN creation for Ceph OSD disks (6TB SAS) and Ceph Journal disks (400GB SSDs) still remains on the Ceph Storage profile. Due to the Cisco UCS Manager limitations, we have to create OSD LUNs and Journal LUNs after the Cisco UCS C240 M4 server has been successfully associated with the Ceph Storage Service profiles.

Create Service Profile Templates for Controller Nodes

To configure the Service Profile Templates for the Controller Nodes, complete the following steps:

Under Servers à Service Profile Templates à root à Sub-Organizations -> right-click and select Create Service Profile Template.

Specify the Service profile template name for the Controller node as OSP8-Controller-SP-Template. Choose the UUID pools previously created from the drop-down list and click Next.

For Storage Provisioning, choose Expert and click Storage profile Policy and choose the Storage profile Blade-OS-boot previously created from the drop-down list and click Next.

For Networking, choose Expert and click “+”.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation4.JPG$

Create the VNIC interface for PXE or Provisioning network as PXE-NIC and click the check box Use VNIC template.

Under vNIC template, choose the PXE-NIC template previously created from the drop-down list and choose Linux for the Adapter Policy.

Create the VNIC interface for Tenant Internal Network as Tenant-Internal and then under vNIC template, choose the “Tenant-Internal” template we created before from the drop-down list and choose Adapter Policy as “Linux”.

Create the VNIC interface for Internal API network as Internal-API and click the check box for Use VNIC template.

Under vNIC template, choose the Internal-API-NIC template previously created from the drop-down list and choose Linux for the Adapter Policy.

Create the VNIC interface for Storage Public Network as Storage-Pub and click the check box for Use VNIC template.

Under vNIC template, choose the Storage-Pub-NIC template previously created from the drop-down list and choose Linux for the Adapter Policy.

Create the VNIC interface for Storage Mgmt Cluster Network as Storage-Mgmt and click the check box for Use VNIC template.

Under vNIC template, choose the Storage-Mgmt-NIC template previously created from the drop-down list and choose Linux for the Adapter Policy.

Create the VNIC interface for Floating Network as Tenant-Floating and click the check box the Use VNIC template.

Under the vNIC template, choose the Tenant-Floating template previously created from the drop-down list and choose Linux for the Adapter Policy.

Create the VNIC interface for External Network as External-NIC and click the check box the Use VNIC template.

Under the vNIC template, choose the External-NIC template previously created from the drop-down list and choose Linux for the Adapter Policy.

Create the VNIC interface for Management Network as Management-NIC and click the check box the Use VNIC template.

Under the vNIC template, choose the Management-NIC template previously created from the drop-down list and choose Linux for the Adapter Policy.

After a successful VNIC creation, click Next.

Under the SAN connectivity, choose No VHBAs and click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation13.JPG$

Under Zoning, click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation14.JPG$

Under VNIC/VHBA Placement, choose the vNICs PCI order as shown below and click Next.

Under vMedia Policy, click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation16.JPG$

Under Server Boot Order, choose the boot policy as PXE-LocalBoot previously created, from the drop-down list and click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation17.JPG$

Under Maintenance Policy, choose Server_Ack previously created, from the drop-down list and click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation18.JPG$

Under Operational Policies, choose the IPMI Access Profile as IPMI_admin previously created, from the drop-down list and choose the Power Control Policy as No_Power_Cap and click Finish.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation20.JPG$

Create Service Profile Templates for Compute Nodes

To create the Service Profile templates for the Compute nodes, complete the following steps:

Specify the Service profile template name for the Controller node as OSP8-Compute-SP-Template.

Choose the UUID pools previously created from the drop-down list and click Next.

For Storage Provisioning, choose Expert and click Storage Profile Policy and choose the Storage profile Blade-OS-boot previously created, from the drop-down list and click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation22.JPG$

For Networking, choose Expert and click “+".

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation4.JPG$

Create the VNIC interface for PXE or Provisioning network as PXE-NIC and click the check box for Use VNIC template.

Under the vNIC template, choose the PXE-NIC template previously created, from the drop-down list and choose Linux for the Adapter Policy.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation5.JPG$

Due to the Cisco UCS Manager Plugin limitations, we have created eth1 as VNIC for Tenant Internal Network.

Create the VNIC interface for Internal API network as Internal-API and click the check box for VNIC template.

Under the vNIC template, choose the Internal-API template previously created, from the drop-down list and choose Linux for the Adapter Policy.

Create the VNIC interface for Storage Public Network as Storage-Pub and click the check box for Use VNIC template.

Under the vNIC template, choose the Storage-Pub-NIC template previously created, from the drop-down list and choose Linux for the Adapter Policy.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation8.JPG$

Create the VNIC interface for Management Network as Management and click the check box for Use VNIC template.

Under the vNIC template, choose the Management template previously created, from the drop-down list and choose Linux for the Adapter Policy.

After a successful VNIC creation, click Next.

Under SAN connectivity, choose No VHBAs and click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation13.JPG$

Under Zoning, click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation14.JPG$

Under VNIC/VHBA Placement, choose the vNICs PCI order as shown below and click Next.

Under vMedia Policy, click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation16.JPG$

Under Server Boot Order, choose boot policy as PXE-LocalBoot we created from the drop-down list and click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation17.JPG$

Under Maintenance Policy, choose Server_Ack previously created, from the drop-down list and click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation18.JPG$

Under Operational Policies, choose the IPMI Access Profile as IPMI_admin previously created, from the drop-down list and choose the Power Control Policy as No_Power_Cap and click Finish.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation20.JPG$

Create Service Profile Templates for Ceph Storage Nodes

To create the Service Profile templates for the Ceph Storage nodes, complete the following steps:

Specify the Service profile template name for the Ceph storage node as OSP8-Ceph-Storage-SP-Template. Choose the UUID pools previously created, from the drop-down list and click Next.

Create vNICs for PXE, Stroage-Pub and Storage-Mgmt by following the steps detailed here. You do not need Management Network on Ceph Nodes since this network was created only for UCSM and Nexus Switches.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation25.JPG$

Click Next and then Choose “Server_Ack” under Maintenance Policy. Then select “No-power-cap” under power control policy. Click on Finish to complete the Service profile template creation for Ceph nodes.

Create Service Profile for Undercloud ( OSP8 director ) Node

To configure the Service Profile for Undercloud (OSP8 director) Node, complete the following steps:

As there is only one node for Undercloud, a single Service Profile is created. There are no Service Profile Templates for the Undercloud node.

Under Servers > Service Profiles > root > Sub-Organizations -> osp8 -> right-click and select “Create Service Profile (expert)”

Specify the Service profile name for Undercloud node as OSP8-director. Choose the UUID pools previously created from the drop-down list and click Next.

For Storage Provisioning, choose Expert and click Storage profile Policy and choose the Storage profile Blade-OS-boot previously created from the drop-down list and click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Undercloud-ServiceProfile-creation3.JPG$

For Networking, choose Expert and click “+”.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Undercloud-ServiceProfile-creation4.JPG$

Create the VNIC interface for PXE or Provisioning network as PXE-NIC and click the check box Use VNIC template.

Under vNIC template, choose the PXE-NIC template previously created from the drop-down list and choose Linux for the Adapter Policy.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\ServiceProfile-Template-Creation5.JPG$

Create the VNIC interface for Management network as Management and from the drop-down list choose MAC pools created before. Click Fabric B and check Enable Failover.

Under VLANs, Select “Management” as Native VLAN, then choose Adapter Policy as “Linux” and Network Controller Policy as “Enable_CDP”

Create the VNIC interface for External network as External-NIC and from the drop-down list choose MAC pools created before. Click Fabric B and check Enable Failover.

Under VLANs, Select “External” as Native VLAN, then choose Adapter Policy as “Linux” and Network Controller Policy as “Enable_CDP”

Create the VNIC interface for Floating network as Tenant-Floating and click the check box Use VNIC template.

Under vNIC template, choose the Floating Network (optional) template previously created from the drop-down list and choose Linux for the Adapter Policy.

Tenant Floating NIC has been created on the system just to verify that access to the VMs from director node works. If you would like to access the VMs externally from another node outside of this setup and this floating network is routable, this step isn’t necessary.

After a successful VNIC creation, click Next.

Under the SAN connectivity, choose No VHBAs and click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Undercloud-ServiceProfile-creation9.JPG$

Under Zoning, click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Undercloud-ServiceProfile-creation10.JPG$

Under VNIC/VHBA Placement, choose the vNICs PCI order as shown below and click Next.

Under vMedia Policy, click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Undercloud-ServiceProfile-creation12.JPG$

Under Server Boot Order, choose the boot policy as “Create a Specific Boot Policy”, from the drop-down list and click Next. Make sure you select “ local CD/DVD” as first boot order and “ local LUN” as second boot order and click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Undercloud-ServiceProfile-creation13.JPG$

Under Maintenance Policy, choose Server_Ack previously created, from the drop-down list and click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Undercloud-ServiceProfile-creation14.JPG$

Under Server Assignment, choose “Select existing server” and select the respective blade assigned for director node and click Next.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Undercloud-ServiceProfile-creation15.JPG$

Under Operational Policies, choose the Power Control Policy as “No_Power_Cap” and click Finish.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Undercloud-ServiceProfile-creation16.JPG$

Create Service Profiles for Controller Nodes

To create Service profiles for Controller nodes, complete the following steps:

Under Servers à Service Profile Templates à root à Sub-Organizations -> select the Controller Service profile template and click Create Service Profiles from Templates.

Specify the Service profile name and the number of instances as 3 for the Controller nodes.

Make sure the Service profiles for the Controller nodes have been created. Under Servers à Service profiles à root -> Sub-Organizations (osp8).

Create Service Profiles for Compute Nodes

To create Service profiles for Compute nodes, complete the following steps:

Under Servers à Service Profile Templates à root à Sub-Organizations and select the Compute Service profile template and click Create Service Profiles from Templates, similar to Controller above.

Specify the profile name and set the number of instances to 6 for compute nodes.

Make sure the Service profiles for the Compute nodes have been created under the Sub Orgs.

Create Service Profiles for Ceph Storage Nodes

To create Service profiles for Ceph Storage nodes, complete the following steps:

Under Servers à Service Profile Templates à root à Sub-Organizations and select the Ceph Storage Service profile template and click Create Service Profiles from Templates.

Specify the Service profile name and set the number of instances to 3 for the Ceph Storage nodes.

Make sure the Service profiles for the Ceph Storage nodes have been created under Sub Orgs.

Verify the Service profile association with the respective UCS Servers.

Create LUNs for the Ceph OSD and Journal Disks

After a successful CephStorage Server association, create the remaining LUNs for the Ceph OSD disks and Journal disks.

Create the Ceph Journal LUN

To create the Ceph Journal LUNs, complete the following steps:

Under Storage àStorage Provisioning à root à Sub-Organizations -> select the previously created Ceph Sotrage profile C240-Ceph à click Local LUNs à click Create Local LUNs.

Specify the name as Journal1 and set the size in GB to 350 for the 400GB SSD disks and click Create Disk Group Policy.

Specify the Disk group policy name and choose the RAID level as RAID 0 and select Disk Group Configuration (Manual).

Specify the Slot ID as 3, which is the physical disk slot number for 400GB SSDs for the Journal LUN1 and click OK.

$Description: C:\Users\vijd\Desktop\Austin-CVD\Screenshots\UCSM\Storage-Profile-Rack-Creation-Post4.JPG$

Click OK to confirm the Disk group policy creation.

From the drop-down list, choose the Disk group policy for the Journal LUN as Journal1.

Similar to the above, create the Local LUN as Journal2 with Disk group policy as Journal2 using 400GB SSD on Disk Slot4.

Create the Ceph OSD LUN

To create the Ceph OSD LUN, complete the following steps:

Under Storage à Provisioning à root à Sub-Organizations and select the previously created Ceph Storage profile C240-Ceph à click Local LUNs à click Create Local LUN.

Specify the name as OSD1 and the size in GB as 5500 for the 6TB SAS disks and click Create Disk Group Policy.

Specify Disk group policy name and Choose RAID level as RAID 0 and select Disk Group Configuration(Manual)

Click “+” and Specify Slot ID as 5, which is physical disk slot number for 6TB SAS disks for Ceph OSD LUN1 and click OK.

From the drop-down list, choose the Disk group policy for OSD1 as OSD1.

Create the remaining OSD LUNs 3, 4, 5, 6, 7, 8 with the Disk group policy using 6TB SAS disks 6, 7, 8, 9, 10, 11, and 12.

Make sure the LUNs for Journals and OSDs are created as shown below.

Make sure all the Ceph Storage Servers have the identical LUN ID and Device ID for all the LUNs (OS-boot, Journal and OSD) as shown in the table below:

Physical Disk Slot	Disk Type	Disk Size	RAID Level	LUN Size	LUN ID	Device ID
Disk 1	SAS	300 GB	RAID 1	250 GB	1000	0
Disk 2	SAS	300 GB	RAID 1	250 GB	1000	0
Disk 3	SSD	400 GB	RAID 0	350 GB	1001	1
Disk 4	SSD	400 GB	RAID 0	350 GB	1002	2
Disk 5	SAS	6 TB	RAID 0	5500 GB	1003	3
Disk 6	SAS	6 TB	RAID 0	5500 GB	1004	4
Disk 7	SAS	6 TB	RAID 0	5500 GB	1005	5
Disk 8	SAS	6 TB	RAID 0	5500 GB	1006	6
Disk 9	SAS	6 TB	RAID 0	5500 GB	1007	7
Disk 10	SAS	6 TB	RAID 0	5500 GB	1008	8
Disk 11	SAS	6 TB	RAID 0	5500 GB	1009	9
Disk 12	SAS	6 TB	RAID 0	5500 GB	1010	10

All the disks have to be in Equipped state; LUNs have to be in Applied and Operable state as shown above.

Create Port Channels for Cisco UCS Fabrics

Figure 10 illustrates the virtual Port Channel configuration.

Figure 10 Nexus vPC Configuration

To create Port Channels from the UCS Manager GUI, complete the following steps:

Under LAN à Cloud à Fabric A à Port Channels à right-click and select Create Port Channel.

Specify the ID and name for the port channel and click Next.

Select the ports 17 and 18 from left pane and move to the right pane into Ports in the Port Channel and click Finish.

Repeat the steps shown above on Fabric B with Port-Channel as18.

Cisco Nexus Configuration

Configure the Cisco Nexus 9372 PX Switch A

To configure the Cisco Nexus 9372 PX Switch A, complete the following step:

Connect the console port to the Nexus 9372 PX switch designated for Fabric A:

---- Basic System Configuration Dialog VDC: 1 ----

This setup utility will guide you through the basic configuration of the system. Setup configures only enough connectivity for management of the system.

*Note: setup is mainly used for configuring the system initially, when no configuration is present. So setup always assumes system defaults and not the current system configuration values.

Press Enter at anytime to skip a dialog. Use ctrl-c at anytime to skip the remaining dialogs.

Would you like to enter the basic configuration dialog (yes/no): yes

Do you want to enforce secure password standard (yes/no) [y]:

Create another login account (yes/no) [n]:

Configure read-only SNMP community string (yes/no) [n]:

Configure read-write SNMP community string (yes/no) [n]:

Enter the switch name : OSP8-N9K-FAB-A

Continue with Out-of-band (mgmt0) management configuration? (yes/no) [y]:

Mgmt0 IPv4 address : 10.23.10.3

Mgmt0 IPv4 netmask : 255.255.255.0

Configure the default gateway? (yes/no) [y]:

IPv4 address of the default gateway : 10.23.10.1

Configure advanced IP options? (yes/no) [n]:

Enable the telnet service? (yes/no) [n]:

Enable the ssh service? (yes/no) [y]:

Type of ssh key you would like to generate (dsa/rsa) [rsa]:

Number of rsa key bits <1024-2048> [2048]:

Configure the ntp server? (yes/no) [n]: y

NTP server IPv4 address : <<ntp_server_ip>>

Configure CoPP system profile (strict/moderate/lenient/dense/skip) [strict]:

The following configuration will be applied:

password strength-check

switchname OSP8-N9k-FAB-A

vrf context management

ip route 0.0.0.0/0 10.23.10.1

exit

no feature telnet

ssh key rsa 2048 force

feature ssh

ntp server <<var_global_ntp_server_ip>>

copp profile strict

interface mgmt0

ip address 10.23.10.3 255.255.255.0

no shutdown

Would you like to edit the configuration? (yes/no) [n]: Enter

Use this configuration and save it? (yes/no) [y]: Enter

[########################################] 100%

Copy complete.

Configure the Cisco Nexus 9372 PX Switch B

To configure the Cisco Nexus 9372 PX Switch B, complete the following step:

Connect the console port to the Nexus 9372 PX switch designated for Fabric B:

---- Basic System Configuration Dialog VDC: 1 ----

This setup utility will guide you through the basic configuration of the system. Setup configures only enough connectivity for management of the system.

*Note: setup is mainly used for configuring the system initially, when no configuration is present. So setup always assumes system defaults and not the current system configuration values.

Press Enter at anytime to skip a dialog. Use ctrl-c at anytime to skip the remaining dialogs.

Would you like to enter the basic configuration dialog (yes/no): yes

Do you want to enforce secure password standard (yes/no) [y]:

Create another login account (yes/no) [n]:

Configure read-only SNMP community string (yes/no) [n]:

Configure read-write SNMP community string (yes/no) [n]:

Enter the switch name : OSP8-N9k-FAB-B

Continue with Out-of-band (mgmt0) management configuration? (yes/no) [y]:

Mgmt0 IPv4 address : 10.23.10.4

Mgmt0 IPv4 netmask : 255.255.255.0

Configure the default gateway? (yes/no) [y]:

IPv4 address of the default gateway : 10.23.10.1

Configure advanced IP options? (yes/no) [n]:

Enable the telnet service? (yes/no) [n]:

Enable the ssh service? (yes/no) [y]:

Type of ssh key you would like to generate (dsa/rsa) [rsa]:

Number of rsa key bits <1024-2048> [2048]:

Configure the ntp server? (yes/no) [n]: y

NTP server IPv4 address : <<ntp_server_ip>>

Configure CoPP system profile (strict/moderate/lenient/dense/skip) [strict]:

The following configuration will be applied:

password strength-check

switchname OSP8-N9k-FAB-B

vrf context management

ip route 0.0.0.0/0 10.23.10.1

exit

no feature telnet

ssh key rsa 2048 force

feature ssh

ntp server <<var_global_ntp_server_ip>>

copp profile strict

interface mgmt0

ip address 10.23.10.4 255.255.255.0

no shutdown

Would you like to edit the configuration? (yes/no) [n]: Enter

Use this configuration and save it? (yes/no) [y]: Enter

[########################################] 100%

Copy complete.

Check Nexus OS compatibility

OSP8-N9K-FAB-A# show version

Cisco Nexus Operating System (NX-OS) Software

………

Software

BIOS: version 07.34

NXOS: version 7.0(3)I1(3)

BIOS compile time: 08/11/2015

NXOS image file is: bootflash:///n9000-dk9.7.0.3.I1.3.bin

NXOS compile time: 8/21/2015 3:00:00 [08/21/2015 10:27:18]

Make sure that the software version of Nexus OS is 7.0 (3) I1(3) as this is the version of Nexus OS that was validated. Either Upgrade or downgrade the switch to this version as below:

· Go to https://software.cisco.com/download/navigator.html

· On the products tab, select switches and then Data Center Switches and then Nexus 9000 series switch

· Select the model say 9372 as below

· Select NX-OS system software, expand All Releases and download the version 7.0(3)I1.3

· Upgrade or downgrade the software by following instructions from Nexus 9000 Guide. The Upgrade/downgrade can also be referred from here.

Enable Features on the Switch

To enable the features on the switch, enter the following:

OSP8-N9K-FAB-A# config terminal

OSP8-N9k-FAB-A(config)# feature udld

OSP8-N9K-FAB-A(config)# feature interface-vlan

OSP8-N9K-FAB-A(config)# feature hsrp

OSP8-N9K-FAB-A(config)# feature lacp

OSP8-N9K-FAB-A(config)# feature vpc

OSP8-N9K-FAB-A(config)# exit

Repeat the same steps on Nexus 9372 Switch B.

Enable Jumbo MTU

To enable the Jumbo MTU, enter the following:

OSP8-N9K-FAB-A# config terminal

OSP8-N9K-FAB-A(config)# system jumbomtu 9216

OSP8-N9K-FAB-A(config)# exit

Repeat the same steps on Nexus 9372 Switch B.

Create VLANs

To create VLANs, enter the following:

OSP8-N9K-FAB-A# config terminal

OSP8-N9K-FAB-A(config)# vlan 10

OSP8-N9K-FAB-A(config-vlan)# name Management

OSP8-N9K-FAB-A(config-vlan)# no shut

OSP8-N9K-FAB-A(config-vlan)# exit

OSP8-N9K-FAB-A(config)# vlan 100

OSP8-N9K-FAB-A(config-vlan)# name Internal-API

OSP8-N9K-FAB-A(config-vlan)# no shut

OSP8-N9K-FAB-A(config-vlan)# exit

OSP8-N9K-FAB-A(config)# vlan 110

OSP8-N9K-FAB-A(config-vlan)# name PXE-Network

OSP8-N9K-FAB-A(config-vlan)# no shut

OSP8-N9K-FAB-A(config-vlan)# exit

OSP8-N9K-FAB-A(config)# vlan 120

OSP8-N9K-FAB-A(config-vlan)# name Storage-Public-Network

OSP8-N9K-FAB-A(config-vlan)# no shut

OSP8-N9K-FAB-A(config-vlan)# exit

OSP8-N9K-FAB-A(config)# vlan 150

OSP8-N9K-FAB-A(config-vlan)# name Storage-Mgmt-Network

OSP8-N9K-FAB-A(config-vlan)# no shut

OSP8-N9K-FAB-A(config-vlan)# exit

OSP8-N9K-FAB-A(config)# vlan 160

OSP8-N9K-FAB-A(config-vlan)# name Tenant-Floating-IP-Network

OSP8-N9K-FAB-A(config-vlan)# no shut

OSP8-N9K-FAB-A(config-vlan)# exit

OSP8-N9K-FAB-A(config)# vlan 215

OSP8-N9K-FAB-A(config-vlan)# name External-Network

OSP8-N9K-FAB-A(config-vlan)# no shut

OSP8-N9K-FAB-A(config-vlan)# exit

OSP8-N9K-FAB-A(config)#

Repeat the same steps on Nexus 9372 Switch B.

Configure the Interface VLAN (SVI) on the Cisco Nexus 9K Switch A

To configure the Interface VLAN on the Cisco Nexus 9K Switch A, enter the following:

OSP8-N9K-FAB-A(config)#

OSP8-N9K-FAB-A(config)# interface Vlan10

OSP8-N9K-FAB-A(config-if)# description Management

OSP8-N9K-FAB-A(config-if)# no shutdown

OSP8-N9K-FAB-A(config-if)# no ip redirects

OSP8-N9K-FAB-A(config-if)# ip address 10.23.10.253/24

OSP8-N9K-FAB-A(config-if)# no ipv6 redirects

OSP8-N9K-FAB-A(config-if)# hsrp version 2

OSP8-N9K-FAB-A(config-if-hsrp)# hsrp 10

OSP8-N9K-FAB-A(config-if-hsrp)# preempt

OSP8-N9K-FAB-A(config-if-hsrp)# priority 110

OSP8-N9K-FAB-A(config-if-hsrp)# ip 10.23.10.1

OSP8-N9K-FAB-A(config-if-hsrp)#exit

OSP8-N9K-FAB-A(config)# interface Vlan100

OSP8-N9K-FAB-A(config-if)# description Internal-API

OSP8-N9K-FAB-A(config-if)# no shutdown

OSP8-N9K-FAB-A(config-if)# no ip redirects

OSP8-N9K-FAB-A(config-if)# ip address 10.23.100.253/24

OSP8-N9K-FAB-A(config-if)# no ipv6 redirects

OSP8-N9K-FAB-A(config-if)# hsrp version 2

OSP8-N9K-FAB-A(config-if-hsrp)# hsrp 100

OSP8-N9K-FAB-A(config-if-hsrp)# preempt

OSP8-N9K-FAB-A(config-if-hsrp)# priority 110

OSP8-N9K-FAB-A(config-if-hsrp)# ip 10.23.100.1

OSP8-N9K-FAB-A(config-if-hsrp)#exit

OSP8-N9K-FAB-A(config)# interface Vlan110

OSP8-N9K-FAB-A(config-if)# description PXE_Network

OSP8-N9K-FAB-A(config-if)# no shutdown

OSP8-N9K-FAB-A(config-if)# no ip redirects

OSP8-N9K-FAB-A(config-if)# ip address 10.23.110.253/24

OSP8-N9K-FAB-A(config-if)# no ipv6 redirects

OSP8-N9K-FAB-A(config-if)# hsrp version 2

OSP8-N9K-FAB-A(config-if-hsrp)# hsrp 110

OSP8-N9K-FAB-A(config-if-hsrp)# preempt

OSP8-N9K-FAB-A(config-if-hsrp)# priority 110

OSP8-N9K-FAB-A(config-if-hsrp)# ip 10.23.110.1

OSP8-N9K-FAB-A(config-if-hsrp)#exit

OSP8-N9K-FAB-A(config)# interface Vlan120

OSP8-N9K-FAB-A(config-if)# description Storage_Public_Network

OSP8-N9K-FAB-A(config-if)# no shutdown

OSP8-N9K-FAB-A(config-if)# no ip redirects

OSP8-N9K-FAB-A(config-if)# ip address 10.23.120.253/24

OSP8-N9K-FAB-A(config-if)# no ipv6 redirects

OSP8-N9K-FAB-A(config-if)# hsrp version 2

OSP8-N9K-FAB-A(config-if-hsrp)# hsrp 120

OSP8-N9K-FAB-A(config-if-hsrp)# preempt

OSP8-N9K-FAB-A(config-if-hsrp)# priority 110

OSP8-N9K-FAB-A(config-if-hsrp)# ip 10.23.120.1

OSP8-N9K-FAB-A(config-if-hsrp)#exit

OSP8-N9K-FAB-A(config)# interface Vlan150

OSP8-N9K-FAB-A(config-if)# description Storage_ClusterMgmt_Network

OSP8-N9K-FAB-A(config-if)# no shutdown

OSP8-N9K-FAB-A(config-if)# no ip redirects

OSP8-N9K-FAB-A(config-if)# ip address 10.23.150.253/24

OSP8-N9K-FAB-A(config-if)# no ipv6 redirects

OSP8-N9K-FAB-A(config-if)# hsrp version 2

OSP8-N9K-FAB-A(config-if-hsrp)# hsrp 150

OSP8-N9K-FAB-A(config-if-hsrp)# preempt

OSP8-N9K-FAB-A(config-if-hsrp)# priority 110

OSP8-N9K-FAB-A(config-if-hsrp)# ip 10.23.150.1

OSP8-N9K-FAB-A(config-if-hsrp)#exit

OSP8-N9K-FAB-A(config)# interface Vlan160

OSP8-N9K-FAB-A(config-if)# description Tenanat_Floating_Network

OSP8-N9K-FAB-A(config-if)# no shutdown

OSP8-N9K-FAB-A(config-if)# no ip redirects

OSP8-N9K-FAB-A(config-if)# ip address 10.23.160.253/24

OSP8-N9K-FAB-A(config-if)# no ipv6 redirects

OSP8-N9K-FAB-A(config-if)# hsrp version 2

OSP8-N9K-FAB-A(config-if-hsrp)# hsrp 160

OSP8-N9K-FAB-A(config-if-hsrp)# preempt

OSP8-N9K-FAB-A(config-if-hsrp)# priority 110

OSP8-N9K-FAB-A(config-if-hsrp)# ip 10.23.160.1

OSP8-N9K-FAB-A(config-if-hsrp)#exit

OSP8-N9K-FAB-A(config)# interface Vlan215

OSP8-N9K-FAB-A(config-if)# description External_Network

OSP8-N9K-FAB-A(config-if)# no shutdown

OSP8-N9K-FAB-A(config-if)# no ip redirects

OSP8-N9K-FAB-A(config-if)# ip address 172.22.215.253/24

OSP8-N9K-FAB-A(config-if)# no ipv6 redirects

OSP8-N9K-FAB-A(config-if)# exit

OSP8-N9K-FAB-A(config)# Copy running-config Startup-config

Configure the Interface VLAN (SVI) on the Cisco Nexus 9K Switch B

To configure the Interface VLAN on the Cisco Nexus 9K Switch B, enter the following:

OSP8-N9k-FAB-B(config)#

OSP8-N9k-FAB-B(config)# interface Vlan10

OSP8-N9k-FAB-B(config-if)# description Management

OSP8-N9k-FAB-B(config-if)# no shutdown

OSP8-N9k-FAB-B(config-if)# no ip redirects

OSP8-N9k-FAB-B(config-if)# ip address 10.23.10.254/24

OSP8-N9k-FAB-B(config-if)# no ipv6 redirects

OSP8-N9k-FAB-B(config-if)# hsrp version 2

OSP8-N9k-FAB-B(config-if-hsrp)# hsrp 100

OSP8-N9k-FAB-B(config-if-hsrp)# preempt

OSP8-N9k-FAB-B(config-if-hsrp)# priority 100

OSP8-N9k-FAB-B(config-if-hsrp)# ip 10.23.100.1

OSP8-N9k-FAB-B(config-if-hsrp)# exit

OSP8-N9k-FAB-B(config)# interface Vlan100

OSP8-N9k-FAB-B(config-if)# description Internal-API

OSP8-N9k-FAB-B(config-if)# no shutdown

OSP8-N9k-FAB-B(config-if)# no ip redirects

OSP8-N9k-FAB-B(config-if)# ip address 10.23.100.254/24

OSP8-N9k-FAB-B(config-if)# no ipv6 redirects

OSP8-N9k-FAB-B(config-if)# hsrp version 2

OSP8-N9k-FAB-B(config-if-hsrp)# hsrp 100

OSP8-N9k-FAB-B(config-if-hsrp)# preempt

OSP8-N9k-FAB-B(config-if-hsrp)# priority 100

OSP8-N9k-FAB-B(config-if-hsrp)# ip 10.23.100.1

OSP8-N9k-FAB-B(config-if-hsrp)# exit

OSP8-N9k-FAB-B(config)# interface Vlan110

OSP8-N9k-FAB-B(config-if)# description PXE_Network

OSP8-N9k-FAB-B(config-if)# no shutdown

OSP8-N9k-FAB-B(config-if)# no ip redirects

OSP8-N9k-FAB-B(config-if)# ip address 10.23.110.254/24

OSP8-N9k-FAB-B(config-if)# no ipv6 redirects

OSP8-N9k-FAB-B(config-if)# hsrp version 2

OSP8-N9k-FAB-B(config-if-hsrp)# hsrp 110

OSP8-N9k-FAB-B(config-if-hsrp)# preempt

OSP8-N9k-FAB-B(config-if-hsrp)# priority 100

OSP8-N9k-FAB-B(config-if-hsrp)# ip 10.23.110.1

OSP8-N9k-FAB-B(config-if-hsrp)# exit

OSP8-N9k-FAB-B(config)# interface Vlan120

OSP8-N9k-FAB-B(config-if)# description Storage_Public_Network

OSP8-N9k-FAB-B(config-if)# no shutdown

OSP8-N9k-FAB-B(config-if)# no ip redirects

OSP8-N9k-FAB-B(config-if)# ip address 10.23.120.254/24

OSP8-N9k-FAB-B(config-if)# no ipv6 redirects

OSP8-N9k-FAB-B(config-if)# hsrp version 2

OSP8-N9k-FAB-B(config-if-hsrp)# hsrp 120

OSP8-N9k-FAB-B(config-if-hsrp)# preempt

OSP8-N9k-FAB-B(config-if-hsrp)# priority 100

OSP8-N9k-FAB-B(config-if-hsrp)# ip 10.23.120.1

OSP8-N9k-FAB-B(config-if-hsrp)# exit

OSP8-N9k-FAB-B(config)# interface Vlan150

OSP8-N9k-FAB-B(config-if)# description Storage_ClusterMgmt_Network

OSP8-N9k-FAB-B(config-if)# no shutdown

OSP8-N9k-FAB-B(config-if)# no ip redirects

OSP8-N9k-FAB-B(config-if)# ip address 10.23.150.254/24

OSP8-N9k-FAB-B(config-if)# no ipv6 redirects

OSP8-N9k-FAB-B(config-if)# hsrp version 2

OSP8-N9k-FAB-B(config-if-hsrp)# hsrp 150

OSP8-N9k-FAB-B(config-if-hsrp)# preempt

OSP8-N9k-FAB-B(config-if-hsrp)# priority 100

OSP8-N9k-FAB-B(config-if-hsrp)# ip 10.23.150.1

OSP8-N9k-FAB-B(config-if-hsrp)# exit

OSP8-N9k-FAB-B(config)# interface Vlan160

OSP8-N9k-FAB-B(config-if)# description Tenanat_Floating_Network

OSP8-N9k-FAB-B(config-if)# no shutdown

OSP8-N9k-FAB-B(config-if)# no ip redirects

OSP8-N9k-FAB-B(config-if)# ip address 10.23.160.254/24

OSP8-N9k-FAB-B(config-if)# no ipv6 redirects

OSP8-N9k-FAB-B(config-if)# hsrp version 2

OSP8-N9k-FAB-B(config-if-hsrp)# hsrp 160

OSP8-N9k-FAB-B(config-if-hsrp)# preempt

OSP8-N9k-FAB-B(config-if-hsrp)# priority 100

OSP8-N9k-FAB-B(config-if-hsrp)# ip 10.23.160.1

OSP8-N9k-FAB-B(config-if-hsrp)# exit

OSP8-N9k-FAB-B(config)# interface Vlan215

OSP8-N9k-FAB-B(config-if)# description External_Network

OSP8-N9k-FAB-B(config-if)# no shutdown

OSP8-N9k-FAB-B(config-if)# no ip redirects

OSP8-N9k-FAB-B(config-if)# ip address 172.22.215.254/24

OSP8-N9k-FAB-B(config-if)# no ipv6 redirects

OSP8-N9k-FAB-B(config-if)# exit

OSP8-N9K-FAB-B(config)# Copy running-config Startup-config

Configure the VPC and Port Channels on Switch A

To configure the VPC and Port Channels on Switch A, enter the following:

OSP8-N9K-FAB-A(config)# vpc domain 1

OSP8-N9K-FAB-A(config-vpc-domain)# role priority 10

OSP8-N9K-FAB-A(config-vpc-domain)# peer-keepalive destination 10.23.100.4

OSP8-N9K-FAB-A(config-vpc-domain)# peer-gateway

OSP8-N9K-FAB-A(config-vpc-domain)# exit

OSP8-N9K-FAB-A(config)# interface port-channel1

OSP8-N9K-FAB-A(config-if)# description VPC peerlink for Nexus 9k Switch A & B

OSP8-N9K-FAB-A(config-if)# switchport mode trunk

OSP8-N9K-FAB-A(config-if)# spanning-tree port type network

OSP8-N9K-FAB-A(config-if)# speed 10000

OSP8-N9K-FAB-A(config-if)# vpc peer-link

OSP8-N9K-FAB-A(config-if)# exit

OSP8-N9K-FAB-A(config)# interface Ethernet1/1

OSP8-N9K-FAB-A(config-if)# description connected to Peer Nexus 9k-B port1/1

OSP8-N9K-FAB-A(config-if)# switchport mode trunk

OSP8-N9K-FAB-A(config-if)# speed 10000

OSP8-N9K-FAB-A(config-if)# channel-group 1 mode active

OSP8-N9K-FAB-A(config-if)# exit

OSP8-N9K-FAB-A(config)# interface Ethernet1/2

OSP8-N9K-FAB-A(config-if)# description connected to Peer Nexus 9k-B port1/2

OSP8-N9K-FAB-A(config-if)# switchport mode trunk

OSP8-N9K-FAB-A(config-if)# speed 10000

OSP8-N9K-FAB-A(config-if)# channel-group 1 mode active

OSP8-N9K-FAB-A(config-if)# exit

OSP8-N9K-FAB-A(config)# interface port-channel17

OSP8-N9K-FAB-A(config-if)# description Port-channel for UCS_Fabric_A port_17 & port_18

OSP8-N9K-FAB-A(config-if)# vpc 17

OSP8-N9K-FAB-A(config-if)# exit

OSP8-N9K-FAB-A(config)# interface port-channel18

OSP8-N9K-FAB-A(config-if)# description Port-channel for UCS_Fabric_B port_17 & port_18

OSP8-N9K-FAB-A(config-if)# vpc 18

OSP8-N9K-FAB-A(config-if)# exit

OSP8-N9K-FAB-A(config)# interface Ethernet1/17

OSP8-N9K-FAB-A(config-if)# description Uplink from UCS_Fabric_A_Port_17

OSP8-N9K-FAB-A(config-if)# channel-group 17 mode active

OSP8-N9K-FAB-A(config-if)# exit

OSP8-N9K-FAB-A(config)# interface Ethernet1/18

OSP8-N9K-FAB-A(config-if)# description Uplink from UCS_Fabric_B_Port_17

OSP8-N9K-FAB-A(config-if)# channel-group 18 mode active

OSP8-N9K-FAB-A(config-if)# exit

OSP8-N9K-FAB-A(config)# interface port-channel17

OSP8-N9K-FAB-A(config-if)# switchport mode trunk

OSP8-N9K-FAB-A(config-if)# switchport trunk allowed vlan 10,100,110,120, 150,160,215

OSP8-N9K-FAB-A(config-if)# spanning-tree port type edge trunk

OSP8-N9K-FAB-A(config-if)# mtu 9216

OSP8-N9K-FAB-A(config-if)# exit

OSP8-N9K-FAB-A(config)# interface port-channel18

OSP8-N9K-FAB-A(config-if)# switchport mode trunk

OSP8-N9K-FAB-A(config-if)# switchport trunk allowed vlan 10,100,110,120, 150,160,215

OSP8-N9K-FAB-A(config-if)# spanning-tree port type edge trunk

OSP8-N9K-FAB-A(config-if)# mtu 9216

OSP8-N9K-FAB-A(config-if)# exit

OSP8-N9K-FAB-A(config)# copy running-config startup-config

Configure the VPC and Port Channels on the Cisco Nexus 9K Switch B

To configure the VPC and Port Channels on the Cisco Nexus 9K Switch B, enter the following:

OSP8-N9k-FAB-B(config)# vpc domain 1

OSP8-N9K-FAB-B(config-vpc-domain)# role priority 10

OSP8-N9k-FAB-B(config-vpc-domain)# peer-keepalive destination 10.23.100.3

OSP8-N9k-FAB-B(config-vpc-domain)# peer-gateway

OSP8-N9k-FAB-B(config-vpc-domain)# exit

OSP8-N9k-FAB-B(config)# interface port-channel1

OSP8-N9k-FAB-B(config-if)# description VPC peerlink for Nexus 9k Switch A & B

OSP8-N9k-FAB-B(config-if)# switchport mode trunk

OSP8-N9k-FAB-B(config-if)# spanning-tree port type network

OSP8-N9k-FAB-B(config-if)# speed 10000

OSP8-N9k-FAB-B(config-if)# vpc peer-link

OSP8-N9k-FAB-B(config-if)# exit

OSP8-N9k-FAB-B(config)# interface Ethernet1/1

OSP8-N9k-FAB-B(config-if)# description connected to Peer Nexus 9k-A port1/1

OSP8-N9k-FAB-B(config-if)# switchport mode trunk

OSP8-N9k-FAB-B(config-if)# speed 10000

OSP8-N9k-FAB-B(config-if)# channel-group 1 mode active

OSP8-N9k-FAB-B(config-if)# exit

OSP8-N9k-FAB-B(config)# interface Ethernet1/2

OSP8-N9k-FAB-B(config-if)# description connected to Peer Nexus 9k-A port1/2

OSP8-N9k-FAB-B(config-if)# switchport mode trunk

OSP8-N9k-FAB-B(config-if)# speed 10000

OSP8-N9k-FAB-B(config-if)# channel-group 1 mode active

OSP8-N9k-FAB-B(config-if)# exit

OSP8-N9K-FAB-B(config)# interface port-channel17

OSP8-N9K-FAB-B(config-if)# description Port-channel for UCS_Fabric_A port_17 & port_18

OSP8-N9K-FAB-B(config-if)# vpc 17

OSP8-N9K-FAB-B(config-if)# exit

OSP8-N9K-FAB-B(config)# interface port-channel18

OSP8-N9K-FAB-B(config-if)# description Port-channel for UCS_Fabric_B port_17 & port_18

OSP8-N9K-FAB-B(config-if)# vpc 18

OSP8-N9K-FAB-B(config-if)# exit

OSP8-N9K-FAB-B(config)# interface Ethernet1/17

OSP8-N9K-FAB-B(config-if)# description Uplink from UCS_Fabric_A_Port_18

OSP8-N9K-FAB-B(config-if)# channel-group 17 mode active

OSP8-N9K-FAB-B(config-if)# exit

OSP8-N9K-FAB-B(config)# interface Ethernet1/18

OSP8-N9K-FAB-B(config-if)# description Uplink from UCS_Fabric_B_Port_18

OSP8-N9K-FAB-B(config-if)# channel-group 18 mode active

OSP8-N9K-FAB-B(config-if)# exit

OSP8-N9K-FAB-B(config)# interface port-channel17

OSP8-N9K-FAB-B(config-if)# switchport mode trunk

OSP8-N9K-FAB-B(config-if)# switchport trunk allowed vlan 10,100,110,120, 150,160,215

OSP8-N9K-FAB-B(config-if)# spanning-tree port type edge trunk

OSP8-N9K-FAB-B(config-if)# mtu 9216

OSP8-N9K-FAB-B(config-if)# exit

OSP8-N9K-FAB-B(config)# interface port-channel18

OSP8-N9K-FAB-B(config-if)# switchport mode trunk

OSP8-N9K-FAB-B(config-if)# switchport trunk allowed vlan 10,100,110,120, 150,160,215

OSP8-N9K-FAB-B(config-if)# spanning-tree port type edge trunk

OSP8-N9K-FAB-B(config-if)# mtu 9216

OSP8-N9K-FAB-B(config-if)# exit

OSP8-N9K-FAB-B(config)# copy running-config startup-config

Verify the Port Channel Status on the Cisco Nexus Switches

After successfully creating a Virtual Port Channel on both Nexus switches, verify the Port Channel status on the Nexus 9K Switch. To verify the status, enter the following:

Verify the Port Channels Status on the Fabrics

To verify the status on the Fabrics, complete the following steps as shown in the screenshots below:

Cisco UCS Validation Checks

Prior to starting the Operating System installation on the Undercloud Node, you must complete the pre-validation checks. To complete the validation checks, complete the following steps:

If you are planning to use Jumbo frames for the storage network, make sure to enter the following information in the templates as shown in the screenshot below.

When the service profiles are created from the template, unbind from the templates in case they have been created as updating templates. This is to accommodate the Cisco UCS Manager Plugin. Keeping the compute host's service profiles bound to the template does not allow the plugin to individually configure each compute host with tenant based VLANs. The service profiles for each compute host need to be unbound from the template. Please check the current limitations outlined in the UCSM Liberty plugin web page.

VLAN ID is included in the OpenStack configuration. Do not tag the native VLAN for your external interface on the overcloud service profiles.

The provisioning interfaces should be Native for both Undercloud and Overcloud setups.

While planning your networks, make sure all the networks defined are not overlapping with any of your data-center networks.

The disks should be in the same order across all storage nodes.

Install the Operating System on the Undercloud Node

It is highly recommended to install the Operating System with versionlock as outlined in the steps below. Versionlock restricts yum to install or upgrade a package to a fixed specific version than specified using the versionlock plugin of yum.

The steps outlined in this document, including a few of the configurations, are bound to the installed packages. Installing the same set of packages, as in this Cisco Validated Design, ensures the accuracy of the solution with minimal deviations. While installing Red Hat OpenStack Platform 8 on Cisco blade and rack servers without version, lock should still work; it needs to be noted that there could be changes in the configurations and install steps needed that may not exist in this document.

Any updates to the Undercloud stack later through yum install may conflict with the version lock packages. You may have to relax the lock files for such updates, when it is required. It is strongly recommended to complete the install with version lock first followed by Overcloud install before attempting any such updates.

Download the versionlock file from Cisco Systems https://communities.cisco.com/docs/DOC-70256

To install the Operating System on the Undercloud Node, complete the following steps:

Download Red Hat Enterprise Linux 7.2 from http://access.redhat.com.

Launch the KVM Console; UCS Manager > Equipment Tab > General > KVM Console.

In the KVM Console Menu, Activate Virtual Devices under Virtual Media and then click Map CD/DVD, attach the downloaded ISO as shown below and then reboot the server.

When the system boots up, press F6 for the boot menu.

The ISO image takes you to the below screen. Select Install Red Hat Enterprise Linux 7.2.

Select the default language and time zone.

In the software selection, screen select server with GUI.

Select manual partitioning as shown below.

Update the values and allocate around 100GB to root partition.

Select network tab and configure the external and Management NICs as shown below. Compare the MAC address in UCSM with MAC address displayed by the installer to identify the correct vNIC configuration.

The Floating network has been added on the test bed for logging into the VMs from director node. This step is optional.

As shown above, configure the External and optional Floating NICs here.

Add External-Network for public network. This is the interface that Undercloud will pull the necessary files from the Red Hat website during the installation. Management Interface is on the test bed to log into Fabric Interconnects and/or Nexus switches. This is how IPMI connectivity happens during introspection and Overcloud installations. If you already have a routable network for UCSM and Nexus, you do not need this interface. Leave the pxe interface NIC as unconfigured. It will be configured later through the Undercloud installation. Floating interface on the director node is not mandatory either. It has been added on the test system to log into VMs from director node.

Enter the root password and optionally create the stack user and reboot the server when prompted and accept the license agreement.

Run Post Install Steps before proceeding:

Register the director node with subscription Management and with release set as 7.2. Attach the pool with OpenStack entitlements.

subscription-manager --release=7.2 register, and then attach to openstack entitlements

Please do not run yum update after registration until you installed the version lock outlined below.

Yum Install the version lock package

yumdownloader yum-plugin-versionlock-1.1.31-34.el7.noarch

You may ignore any Public key messages for now. Just check for the existence of the downloaded file.

Install this rpm.

yum localinstall '/whatever-dir/yum-plugin-versionlock-1.1.31-34.el7.noarch.rpm'

[root@osp8-director ~]# rpm -qa | grep yum-plugin-versionlock

yum-plugin-versionlock-1.1.31-34.el7.noarch

Check for existence of versionlock.conf and versionlock.list in /etc/yum/pluginconf.d/versionlock.list.

Update versionlock.conf as shown below:

Uncomment the line follow_obsoletes

follow_obsoletes = 1

Download the versionlock.list file from https://communities.cisco.com/docs/DOC-70256

Download and Extract the zip file cisco-osp8-cvd.zip from the above web page.

The version lock files are in versionlock directory in the zip file.

Copy the downloaded list file to /etc/yum/pluginconf.d/

yum versionlock list command should reveal the contents for /etc/yum/pluginconf.d/versionlock.list.

Run ifconfig to check the health of the configured interfaces. The pxe interface should not have been configured at this stage.

Check name resolution and external connectivity. This is needed for yum updates and registration.

Validate by running wget www.cisco.com or wget subscription.rhn.redhat.com

Install ntp server and synchronize the clock in director node

yum install ntp –y

Update /etc/ntp.conf with appropriate ntp server address and restart ntpd

[root@osp8-director ~]# service ntpd restart

Redirecting to /bin/systemctl restart ntpd.service

Check the time sync, else restart ntpd to force sync the time:

[root@osp8-director ~]# ntpdate -dv 171.68.38.66

13 Sep 19:32:28 ntpdate[26862]: ntpdate 4.2.6p5@1.2349-o Tue May 3 14:57:04 UTC 2016 (1)

Looking for host 171.68.38.66 and service ntp

host found : mtv5-ai27-dcm10n-ntp2.cisco.com

transmit(171.68.38.66)

receive(171.68.38.66)

…………

13 Sep 19:32:35 ntpdate[26862]: adjust time server 171.68.38.66 offset 0.000115 sec

delay 0.02707, dispersion 0.00000

offset 0.000085

The clock is synchronized to 115 micro seconds now. Usually a clock sync of less than 20 milli seconds is recommended. In case ntp servers are not accessible from overcloud nodes or director node, you may setup director node as your ntp server. Please refer Linux/Red Hat documentation for making director node as your ntp server.

Refer bug 1178497. This bug is not in the main stream, at the time of writing this document. Please follow the workaround steps in the bug and reboot the kernel.

Take a backup of /boot/initramfs<kernel> to revert back in case something goes wrong:

edit the /usr/lib/dracut/modules.d/99shutdown/module-setup.sh and files /usr/lib/dracut/modules.d/99shutdown/shutdown.sh after taking a backup of these files.

module-setup.sh

Change

inst_multiple umount poweroff reboot halt losetup

inst_multiple umount poweroff reboot halt losetup stat

shutdown.sh

insert a block of code

after ./lib/dracut-lib.sh

add:

if [ "$(stat -c '%T' -f /)" = "tmpfs" ]; then
mount -o remount,rw /
fi

Recreate initramfs :

dracut –-force

Unmask the shutdown :

systemctl unmask dracut-shutdown.service

Reboot the node

This completes the OS Installation on the director node.

Undercloud Setup

Undercloud Installation

To install Undercloud, complete the following steps:

Create Stack User:

If the Stack user was not created as part of the install earlier, it has to be created for the Undercloud now.

useradd stack

passwd stack

echo "stack ALL=(root) NOPASSWD:ALL" | tee -a /etc/sudoers.d/stack

chmod 0440 /etc/sudoers.d/stack

Become the Stack user and create the following:

su – stack

mkdir -p ~/images

mkdir –p ~/templates

sudo hostnamectl set-hostname <FQDN of the director node > as an example

sudo hostnamectl set-hostname osp8-director.cisco.com

sudo hostnamectl set-hostname --transient osp8-director.cisco.com

Update /etc/hosts:

sudo vi /etc/hosts as below

#External Interface

172.22.215.24 osp8-director.cisco.com osp8-director

# local

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

Update resolv.conf if needed:

sudo vi /etc/resolv.conf as needed. As an example

search cisco.com

nameserver 8.8.8.8

It is recommended to use your organization DNS server. name server 8.8.8.8 is used here for reference purpose only.

In case you have not registered the Undercloud node, please register the system to Red Hat Network and get the appropriate pool id for Open stack entitlements and attach the pool.

Disable and enable only the required repositories:

sudo subscription-manager repos --disable=*

sudo subscription-manager repos --enable=rhel-7-server-rpms \

--enable=rhel-7-server-extras-rpms \

--enable=rhel-7-server-openstack-8-rpms \

--enable=rhel-7-server-openstack-8-director-rpms \

--enable rhel-7-server-rh-common-rpms

Install Undercloud packages:

Make sure that Versionlock is in place by running yum versionlock list.

sudo yum install -y python-tripleoclient

sudo yum update –y

Create undercloud.conf file:

cp /usr/share/instack-undercloud/undercloud.conf.sample ~/undercloud.conf

The following are the values used in the configuration. 10.23.110 is the pxe network:

image_path = /home/stack/images

local_ip = 10.23.110.26/24

network_gateway = 10.23.110.26

undercloud_public_vip = 10.23.110.27

undercloud_admin_vip = 10.23.110.28

local_interface = enp6s0

network_cidr = 10.23.110.0/24

masquerade_network = 10.23.110.0/24

dhcp_start = 10.23.110.51

dhcp_end = 10.23.110.80

inspection_interface = br-ctlplane

inspection_iprange = 10.23.110.81,10.23.110.110

undercloud_debug = true

By using the provisioning interface on the director node and the local_ip and network_gateway, it configures the system to act as the gateway for all the nodes.

Update enic driver:

Download the enic driver Cisco appropriate for the UCSM version. The version used in the configuration with UCSM 2.2(5) and Red Hat Enterprise Linux 7.2 was 2.1.1.93: go to http://software.cisco.com/download/navigator.html

In the download page, select servers-Unified computing under products. On the right menu select your class of servers say Cisco UCS B-series Blade server software and then select Unified Computing System (UCS) Drivers in the following page.

Select your firmware version under All Releases, as an example 2.2(5d) and download the ISO image of UCS-related drivers for your matching firmware, for example ucs-bxxx-drivers.2.2.5d.iso.

Download the iso file to your undercloud machine and mount the iso:

root@osp8-director ~]# mount ucs-bxxx-drivers.2.2.5d.iso /mnt

mount: /dev/loop0 is write-protected, mounting read-only

cd /mnt/Linux/Network/Cisco/VIC/RHEL/RHEL7.2

cp kmod-enic-2.1.1.93-rhel7u2.el7.x86_64.rpm /tmp

umount /mnt

Install the appropriate enic driver on the director machine.

rpm –ivh /tmp/kmod-enic-2.1.1.93-rhel7u2.el7.x86_64.rpm

Validate by running modinfo:

[root@osp8-director ~]# modinfo enic

filename: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/enic/enic.ko

version: 2.1.1.93

license: GPL v2

author: Scott Feldman <scofeldm@cisco.com>

description: Cisco VIC Ethernet NIC Driver

rhelversion: 7.2

srcversion: D272F11F27065C9714656F4

alias: pci:v00001137d00000071sv*sd*bc*sc*i*

alias: pci:v00001137d00000044sv*sd*bc*sc*i*

alias: pci:v00001137d00000043sv*sd*bc*sc*i*

depends:

vermagic: 3.10.0-327.el7.x86_64 SMP mod_unload modversions

parm: rxcopybreak:Maximum size of packet that is copied to a new buffer on receive (uint)

Copy the enic file to your ~/images directory created above.

cp /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/enic/enic.ko ~/images

[root@osp8-director images]# ls -l enic.ko

-rw-r--r--. 1 stack stack 3982019 Aug 29 05:55 enic.ko

This enic.ko file will be used to customize Overcloud image file later.

Download libguestfs tool needed to customize Overcloud image file:

sudo yum install libguestfs-tools –y

The system is ready to run the Undercloud installation.

Run the following as stack user:

cd /home/stack

openstack undercloud install

This might take around 10-15 minutes.

To debug any Undercloud install failures, check files in /home/stack/.instack/*

Post Undercloud Installation Checks

To perform the Undercloud installation checks, complete the following steps:

Check /etc/resolv.conf

[root@osp8-director ~]# cat /etc/resolv.conf

# Generated by NetworkManager

search cisco.com

nameserver 8.8.8.8

Check control plane bridge:

A new bridge br-ctlplane should have been created as part of the Undercloud install on the pxe interface as shown below. Validate MAC and IP’s.

Check /var/log/ironic/* files, to understand and fix any issues.

Introspection

Pre-Installation Checks for Introspection

To do the pre-installation check, complete the following steps:

Log into the director node as the stack user and source stackrc file.

Check neutron and subnet lists:

Source stackrc as stack user and run neutron net-list, neutron net-show, neutron subnet-list and neutron subnet-show for br-ctlplane:

The allocation_pools, dns_nameservers, cidr should match whatever specified earlier in undercloud.conf file. If not, update with neutron subnet-update.

Check /etc/ironic-inspector/* files:

vi /etc/ironic-inspector/inspector.conf /etc/ironic-inspector/dnsmasq.conf

The dnsmasq.conf dhcp_range should match the undercloud.conf file range. This will help you spot any errors that might have gone while running Undercloud install earlier. The default pxe timeout is 60 minutes. This means if you have more servers to be introspected and it takes longer than 60 minutes, introspection is bound to fail.

Update /etc/ironic-inspector/inspector.conf with timeout variable under discovered section:

timeout=0

Restart ironic in case these files are updated.

[root@osp8-director ~]# systemctl restart openstack-ironic-conductor.service openstack-ironic-inspector-dnsmasq.service openstack-ironic-api.service openstack-ironic-inspector.service

This may be necessary only in larger deployments and depends on the network to download the ramdisk files, CPU speed, etc.

Prepare instack.json file.

This file should contain all the nodes, controllers, computes and storage nodes that need to be introspected.

A sample instackenv.json file is provided in Appendix. Below is an explanation of how to build this file for a node.

[stack@osp8-director ~]$ cat instackenv.json

{

"nodes": [

{

"pm_user": "<ipmi admin user>",

"pm_password": "<password>",

"pm_type": "pxe_ipmitool",

"pm_addr": "10.23.10.57",

"mac": [

"00:25:b5:00:00:08"

"memory": "262144",

"disk": "250",

"arch": "x86_64",

"cpu": "32"

……………………

…………………….

pm_user and pm_password is the ipmi user and password configured earlier for this node’s service profile or templates.

pm_type=”pxe_ipmitool” Leave this, as is

pm_addr is the IPMI address allocated to that node. This can be obtained from the CIMC tab in equipment.

The MAC address is the discovery NIC or pxe interface for that node.

The memory, disk, and CPU can be obtained under the same Inventory tab for that node.

Make sure that the storage lun is applied and in operable state after applying the storage policy.

Build the instackenv.json file for all the hosts that have to be introspected as above.

Make sure to maintain consistent indentations with white spaces or tabs.

Check the ipmi connectivity works for all the hosts:

You can run a quick check to validate this from instackenv.json file:

[stack@osp8-director ~]$ for i in `grep pm_addr instackenv.json | cut -d "\"" -f4`

ipmitool -I lanplus -H $i -U <ipmi admin user> -P <replace with your ipmi password> chassis power status

done

Chassis Power is off

The chassis power status should be either On or Off depending on whether the server is up or down in UCS. However any errors like the example shown below need investigation:

Error: Unable to establish IPMI v2 / RMCP+ session

Unable to get Chassis Power Status

Make lvm changes for bug 1323024

Update /etc/lvm/lvm.conf activation function

# Configuration section activation.

activation {

……

#Add the below line where rhel/home comes from vg/lv display or /etc/fstab

……

auto_activation_volume_list = ["rhel/home"]

}

Reboot the node after making the above change.

Download the Image files needed for introspection and Overcloud:

sudo yum install rhosp-director-images rhosp-director-images-ipa

cp /usr/share/rhosp-director-images/overcloud-full-latest-8.0.tar ~/images/.

cp /usr/share/rhosp-director-images/ironic-python-agent-latest-8.0.tar ~/images/.

cd ~/images

for tarfile in *.tar; do tar -xf $tarfile; done

Download the KVM Guest Image to the directory from access.redhat.com

[stack@osp8-director images]$ /bin/ls -1

enic.ko

ironic-python-agent-latest-8.0.tar

ironic-python-agent.initramfs

ironic-python-agent.kernel

overcloud-full-latest-8.0.tar

overcloud-full.initrd

overcloud-full.qcow2

overcloud-full.vmlinuz

You may remove the tar files if desired.

rhel-guest-image-7.2-20151102.0.x86_64.qcow2

Customize the Overcloud image with enic drivers and fencing packages.

Run the following as root user. Navigate to your download directory and issue the following as root:

cd /home/stack/images

export LIBGUESTFS_BACKEND=direct

Update fencing packages.

Refer bug 1298430.

Download the fencing packages from Red Hat web site.

sudo yumdownloader fence-agents-cisco-ucs-4.0.11-27.el7_2.9 fence-agents-common-4.0.11-27.el7_2.9 fence-agents-scsi-4.0.11-27.el7_2.9

Upload the downloaded files to overcloud image.

for i in *.rpm

virt-customize -a overcloud-full.qcow2 --upload $i:/root

done

Validate that the packages do exist in /root

virt-ls -a overcloud-full.qcow2 /root | grep rpm

fence-agents-cisco-ucs-4.0.11-27.el7_2.9.x86_64.rpm

fence-agents-common-4.0.11-27.el7_2.9.x86_64.rpm

fence-agents-scsi-4.0.11-27.el7_2.9.x86_64.rpm

Install these packages in the overcloud image file

[root@osp8-director images]# virt-customize -a overcloud-full.qcow2 --run-command 'yum localinstall -y /root/fence-agents-common-4.0.11-27.el7_2.9.x86_64.rpm /root/fence-agents-cisco-ucs-4.0.11-27.el7_2.9.x86_64.rpm /root/fence-agents-scsi-4.0.11-27.el7_2.9.x86_64.rpm'

[ 0.0] Examining the guest ...

[ 4.0] Setting a random seed

[ 4.0] Running: yum localinstall -y /root/fence-agents-common-4.0.11-27.el7_2.9.x86_64.rpm /root/fence-agents-cisco-ucs-4.0.11-27.el7_2.9.x86_64.rpm /root/fence-agents-scsi-4.0.11-27.el7_2.9.x86_64.rpm

[ 12.0] Finishing off

Update Grub file;

virt-copy-out -a overcloud-full.qcow2 /etc/default/grub /home/stack/images/

vi grub file and change the following line

GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet net.ifnames=0 biosdevname=0"

---(you are appending net.ifnames=0 and biosdevname=0)

virt-copy-in –a overcloud-full.qcow2 ./grub /etc/default/

After this proceed, with the remaining customizations.

Update enic drivers;

[root@osp8-director images]# virt-ls -R -a overcloud-full.qcow2 /lib/modules | grep enic # Get the directory where enic exists

/3.10.0-327.18.2.el7.x86_64/kernel/drivers/net/ethernet/cisco/enic

/3.10.0-327.18.2.el7.x86_64/kernel/drivers/net/ethernet/cisco/enic/enic.ko

enic driver exists in /lib/modules/3.10.0-327.18.2.el7.x86_64/kernel/drivers/net/ethernet/cisco

Copy the enic driver to the above location

virt-copy-in -a overcloud-full.qcow2 ./enic.ko /lib/modules/3.10.0-327.18.2.el7.x86_64/kernel/drivers/net/ethernet/cisco/enic/

The location of this enic driver is dependent on the kernel packaged in the Overcloud image file. Should be changed if needed.

Update root password;

virt-customize -a overcloud-full.qcow2 --root-password password:<password>

Change the permissions back to stack user:

chown stack:stack /home/stack/images/*

This is how the overcloud-full.qcow2 may look after the update:

[root@osp8-director images]# ls -l overcloud-full.qcow2

-rw-r--r--. 1 stack stack 1096679424 Oct 19 14:57 overcloud-full.qcow2

While updating, the image with root password is not required; it becomes useful to login through KVM console in case of Overcloud installation failures and debug the issues.

The enic.ko was extracted earlier on the director node after installing the enic rpm. This helps ensure that both director and the Overcloud images will be with same enic driver.

The grub has been modified to have interface names like eth[0], eth[1] …

The fence_cisco_ucs package has been modified to take care of the HA bug 1298430.

Upload the images to openstack. As stack user run the following:

su – stack

source stackrc

cd ~/images

openstack overcloud image upload --image-path /home/stack/images/

openstack image list

Before running Introspection and Overcloud installation, it is recommended to initialize the boot LUNs. This is required in case you are repeating or using old disks.

Boot the server in UCS, press CTRL-R, then F2 and re-initialize the boot LUNs as shown below and then power off the servers.

Make sure that all the servers are powered off before introspection.

Reboot the Undercloud node and start the introspection.

Run Introspection

To run Introspection, complete the following steps:

As stack user:

source ~/stackrc

openstack baremetal import --json ~/instackenv.json

openstack baremetal configure boot

ironic node-list

openstack baremetal introspection bulk start

Check the status of Introspection:

openstack baremetal introspection bulk status

Refer to the Troubleshooting section for any failures around introspection and how to resolve them.

Set Flavors

Red Hat OpenStack Platform 8 comes with pre-created flavors that can be queried as follows:

openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" baremetal

openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute" compute

openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="control" control

openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="ceph-storage" ceph-storage

The Flavors have to be set to every category of servers. Identify the servers based on IPMI address created earlier in instackenv.json file:

[stack@osp8-director ~]$ for i in $(ironic node-list | awk ' /power/ { print $2 } ')

abc=`ironic node-show $i | grep "10.23" | awk '{print $7}'`

echo $i $abc

done

[stack@osp8-director ~]$ for i in b7dde876-354a-4688-8550-aec8f64c582c e4563ca5-2f12-4e08-9905-f770f740ad2b \

> 285965a9-9713-4301-8ad5-7aa3ef5dd1c2

> do

> ironic node-update $i add properties/capabilities='profile:control,boot_option:local'

> done

[stack@osp8-director ~]$ for i in b4dc04ac-0c69-4000-9c4d-2d82d141905f 036cae70-bdee-427c-987c-a6a2d8a32292 \

> 8570c96e-f9cd-44ff-a1d8-0252bc405c24 af46cd81-c78e-47c5-94e3-44d9d669410c 19260dbb-29a9-4810-b39d-85cc6e1d886f \

> d4dae332-4595-43be-9b63-5a64331ea33b

> do

> ironic node-update $i add properties/capabilities='profile:compute,boot_option:local'

> done

[stack@osp8-director ~]$ for i in 179befe6-2510-4311-ad9f-4880454fdaff \

> ff0dadfe-e2f3-408f-b69d-01398bb9699d b59f57e3-d5e1-499a-80c1-aac0c78c9534

> do

> ironic node-update $i add properties/capabilities='profile:ceph-storage,boot_option:local'

> done

The added profiles can be queried for validation:

[stack@osp8-director ~]$ instack-ironic-deployment --show-profile

Preparing for deployment...

Querying assigned profiles ...

b7dde876-354a-4688-8550-aec8f64c582c

"profile:control,boot_option:local"

e4563ca5-2f12-4e08-9905-f770f740ad2b

"profile:control,boot_option:local"

285965a9-9713-4301-8ad5-7aa3ef5dd1c2

"profile:control,boot_option:local"

b4dc04ac-0c69-4000-9c4d-2d82d141905f

"profile:compute,boot_option:local"

036cae70-bdee-427c-987c-a6a2d8a32292

"profile:compute,boot_option:local"

8570c96e-f9cd-44ff-a1d8-0252bc405c24

"profile:compute,boot_option:local"

af46cd81-c78e-47c5-94e3-44d9d669410c

"profile:compute,boot_option:local"

19260dbb-29a9-4810-b39d-85cc6e1d886f

"profile:compute,boot_option:local"

d4dae332-4595-43be-9b63-5a64331ea33b

"profile:compute,boot_option:local"

179befe6-2510-4311-ad9f-4880454fdaff

"profile:ceph-storage,boot_option:local"

ff0dadfe-e2f3-408f-b69d-01398bb9699d

"profile:ceph-storage,boot_option:local"

b59f57e3-d5e1-499a-80c1-aac0c78c9534

"profile:ceph-storage,boot_option:local"

DONE.

Prepared.

You can also query the servers and associated profiles as openstack overcloud profiles list

You can validate the ipmi, mac_address and server profiles as shown below:

for i in $(ironic node-list | awk '/None/ {print $2}' );

ipmi_addr=`ironic node-show $i | grep "10.23" | awk '{print $7}'`

mac_addr=`ironic node-port-list $i | awk '/00:25/ {print $4}'`

profile=`ironic node-show $i | grep -io "u'profile:.*:local"`

echo $i $ipmi_addr $mac_addr $profile

done

Overcloud Setup

Before delving into the Overcloud installation, it is necessary to understand and change the templates for your configuration. Red Hat OpenSstack Platform Directordirector provides lot of flexibility in configuring Overcloud. At the same time, understanding the parameters and providing the right inputs to heat through these templates is paramount.

Customize Heat Templates

Before attempting the Overcloud install, it is necessary to understand and setup the Overcloud heat templates. For complete details of the templates, please refer to the Red Hat online documentation on OpenStack.

Overcloud is installed through command line interface with the following command. A top down approach of the yaml and configuration files is provided here.

The files are sensitive to whitespaces and tabs.

Refer to Appendix, for run.sh, the command used to deploy Overcloud.

The heat templates have to be customized depending on the network layout and NIC interface configurations in the setup. The templates are standard heat templates in YAML format. They are included in the Appendix section.

The network configuration included in the director are of two categories and are included in /usr/share/openstack-tripleo-heat-templates/network/config.

More details are available at this link.

Single nic VLANs

Bond with VLANs

Single NIC VLAN Templates

This model assumes that you have a single interface allowing all the VLANs configured in the system.

Bond with VLAN Templates

Cisco UCS Configuration

In the Cisco UCS configuration a hybrid model was adopted. This was done for simplicity and also to have a separate VLAN dedicated on each interface for every network. This gives a fine grain control of policies like QOS etc, if needed, but were not adopted for simplicity. NIC2 or eth1 was used as tenant interface.

As stack user mkdir –p /home/stack/templates/nic-configs

Copy the template files from /usr/share/openstack-tripleo-heat-templates. Refer to Red Hat online documentation.

Create network-environment.yaml per above documentation or use Appendix for reference. Sample template files can also be downloaded from https://communities.cisco.com/docs/DOC-70256

Download the zip file and extract the templates directory into /home/stack/templates. Make changes as needed to these templates.

[stack@osp8-director templates]$ ls *.yaml

ceph.yaml management.yaml network-management.yaml timezone.yaml cisco-plugins.yaml network-environment.yaml storage-environment.yaml wipe-disk.yaml

[stack@osp8-director nic-configs]$ ls *.yaml

ceph-storage.yaml compute.yaml controller.yaml

Some of the above files may have to be created. These files are referenced in Overcloud deploy command either directly or through another file. Ceph.yaml has to be modified directly in /usr/share/openstack-tripleo-heat-templates.

Yaml Configuration Files Overview

network-environment.yaml

The first section is for resource_registry. The section for parameter defaults have to be customized. The following are a few important points to be noted in network-environment.yaml file:

· Enter the Network Cidr values in the parameter section.

· The Control Plane Default Route is the Gateway Router for the provisioning network or the Undercloud IP. This matches with your network_gateway and masquerade_network in your undercloud.conf file.

· EC2Metadata IP is the Undercloud IP.

· Neturon External Network Bridge should be set to "''". An empty string to allow multiple external networks or VLANs. In case you are using the same external network for VMs instead of floating IP’s relace the string "''" with br-ex.

· No bonding used in the configuration. This will be addressed in our future releases.

controller.yaml

This parameter section overrides the ones mentioned in the networking-environment file. The get_param calls for the defined parameters. The following are important points to be considered for Controller.yaml file:

· The PXE interface NIC1 should have dhcp as false to configure static ips, with next hop going to Undercloud node.

· The external bridge is configured to the External Interface Default Route on the External Network VlanID.

· The MTU value of 9000 to be added as needed. Both the storage networks are configured on mtu 9000.

compute.yaml

The same rules for the Controller apply:

· The PXE interface NIC1 is configured with dhcp as false. There are no external IP’s available for Controller and Storage. Hence NATing is done through Undercloud node. For this purpose, the Control Plane Default Route is the, network gateway defined in undercloud.conf file which is also the Undercloud local_ip.

· Only the Storage Public network is defined along with Tenant networks on Compute nodes.

ceph-storage.yaml

· Same as Compute.yaml mentioned above.

· Only Storage Public and Storage Cluster are defined in this file.

ceph.yaml

Configuring Ceph.yaml is tricky and needs to be done carefully. This is because we are configuring the partitions even before installing operating system on it. Also depending on the configuration whether you are using C240M4 LFF or C240M4 SFF the configuration changes.

An overview of the current limitations from the Red Hat OpenStack Platform director and Cisco UCS and the workarounds are provided for reference.

The way disk ordering is done is inconsistent. However for Ceph to work you need a consistent way of disk ordering. Post boot you can setup the disk labels by by-uuid or by-partuuid.

This is also a challenge to use JBODs in Ceph, the conventional way. Using RAID-0 Luns in place of JBOD’s is equally challenging. The Lun ID’s have to be consistent every time a server reboots. The order that is deployed in UCS is also unpredictable. The following workarounds have evolved with the configuration to meet these requirements. The internal SSD drives in both C240 LFF and SFF models will not be used as they are not visible to the RAID controller in the current version of UCSM and will pose challenges to Red Hat OpenStack Platform director (they are visible to BIOS, Luns cannot be carved out as RAID controller does not see them and they appear as JBODs to the kernel thus breaking the LUN and JBOD id’s).

Figure 11 Cisco UCS C240 M4 – Large Form Factor with 12 Slots

Figure 12 Cisco UCS C240 M4 – Small Form Factor with 24 Slots

Cisco UCS Side Fixes to Mitigate the Issue

As mentioned earlier, storage profiles will be used from UCS side on these servers:

Make sure that you do not have local disk configuration policy in UCS for these servers. It should read as no-disk policy.

Create storage profile, disk group policy as below under the template. There will be one Disk group policy for each slot. One policy for RAID-10 for the OS luns and one policy each of RAID-0 for the remaining.

Navigate to Create Storage Profile -> Create Local Lun -> Create Disk Group Policy (Manual) -> Create Local disk configuration. This will help in binding the disk slot to each lun created.

Create first the boot LUN from the first 2 slots and then apply. This will give LUN-0 to boot luns.

Create the second and third LUNs from the SSD slots (as in C240M4 LFF ). This would create RAID-0 luns, LUN-1 and LUN-2 on the SSD disks.

The rest of the LUNs can be created and applied in any order.

With the above procedure, we are assured that LUN-0 is for Operating system, LUN-1 and LUN-2 for SSDs and the rest for HDDs. This in turn decodes to /dev/sda for boot lun, /dev/sdb for SSD1 and /dev/sdc for SSD2 and the rest for HDD’s.

Do not apply all the luns at the same time in the storage profile. First apply the boot lun, which should become LUN-0, followed by the SSD luns and then the rest of the HDD luns. Failure to comply with the above, will cause lun assignment in random order and heat will deploy on whatever the first boot lun presented to it.

Follow a similar procedure for C240 SFF servers too. A minimum of 4 SSD journals recommended for C240M4 SFF. The first two SSD luns with 5 partitions and the rest two with 4 partitions each.

OpenStack Side Fixes to Mitigate the Issue

Implementing Red Hat OpenStack Platform director to successfully deploy Ceph on these disks need gpt label pre-created. This can be achieved by including wipe-disk.yaml file which creates these labels with sgdisk utility. Please refer to Appendix for details about wipe-disk.yaml.

In the current version there is only one ceph.yaml file on all the servers. This mapping has to be uniform across the storage servers.

While the contents of ceph.yaml in the Appendix are self-explanatory, the following is how the mappings between SSDs and HDDs need to be done:

ceph::profile::params::osds:

'/dev/sdd':

journal: '/dev/sdb1'

'/dev/sde':

journal: '/dev/sdb2'

'/dev/sdf':

journal: '/dev/sdb3'

'/dev/sdg':

journal: '/dev/sdb4'

'/dev/sdh':

journal: '/dev/sdc1'

'/dev/sdi':

journal: '/dev/sdc2'

'/dev/sdj':

journal: '/dev/sdc3'

'/dev/sdk':

journal: '/dev/sdc4'

The above is an example for C240M4 LFF server. Based on the LUN ids created above /dev/sdb and /dev/sdc are journal entries. Four entries for each of these journal directs Red Hat OpenStack Platform to create 4 partitions on each SSD disk. The entries on the left are for HDD disks. Please do not append the partition number to left side HDD partitions.

A similar approach can be followed for SFF servers.

The ceph.yaml was copied to /usr/share/openstack-tripleo-heat-templates/puppet/hieradata/

cisco-plugins.yaml

The parameters section specifies the parameters.

Cisco UCS Manager

NetworkUCSMIp: UCS Manager IP

NetworkUCSMHostList: Mapping between tenant mac address derived from UCS with Service profile name, comma separated. This list has to be built for all the compute and controller nodes.

Nexus

This will list both the Nexus switches details, their IPs and passwords.

Servers: The list should specify the interface MAC of each controller and compute and the port-channel numbers created on the Nexus switch.

NetworkNexusManagedPhysicalNetwork physnet-tenant, the parameter you pass in the Overcloud deploy command

NetworkNexusVlanNamePrefix: ‘q-‘ These are the vlans that will be created on the switches

NetworkNexusVxlanGlobalConfig: false. Vxlan is not used and is not validated as part of this CVD

NeutronServicePlugins: Leave the default string as is. Any typos may create successfully Overcloud but will fail to create VMs later.

NeutronTypeDrivers: vlan. The only drivers validated in this CVD.

NeutronCorePlugin: 'ml2'

NeutronNetworkVLANRanges: 'physnet-tenant:250:700,floating:160:160' The range you are passing to Overcloud deploy.

Leave the controllerExtraConfig parameters to default as in templates, refer Appendix.

wipe-disks.yaml is configured as part of firstboot to create gpt lables on Storage node disks.

Pre-Installation Checks Prior to Deploying Overcloud

To perform the pre-installation checks, complete the following steps:

Check for the existence of all the templates in templates and nic-configs directory as mentioned earlier.

Run ironic node-list to check that all the servers are available, powered off and not in maintenance.

While understanding the reason why a server is not as listed above, you may use ironic APIs to change the state if they are not in the desired state:

After sourcing stackrc file;

ironic node-set-power-state <uuid> off

ironic node-set-provision-state <uuid> provide

ironic node-set-maintenance <uuid> false

In case of larger deployments, the default values of max resource per stack may not be sufficient.

Reboot the Undercloud node.

Deploying Overcloud

With the templates in place, Overcloud deploy can run the command mentioned in the Appendix. OpenStack help Overcloud deploy will show all the arguments that can be passed to the deployment command.

A snippet is provided below:

[stack@osp8-director ~]$ cat run.sh

#!/bin/bash

openstack overcloud deploy --templates \

-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \

-e /home/stack/templates/network-environment.yaml \

-e /home/stack/templates/network-management.yaml \

-e /home/stack/templates/storage-environment.yaml \

-e /home/stack/templates/timezone.yaml \

-e /home/stack/templates/cisco-plugins.yaml \

--control-flavor control --compute-flavor compute --ceph-storage-flavor ceph-storage \

--compute-scale 4 --control-scale 3 --ceph-storage-scale 3 \

--libvirt-type kvm \

--ntp-server 171.68.38.66 \

--neutron-network-type vlan \

--neutron-bridge-mappings datacentre:br-ex, \

physnet-tenant:br-tenant,floating:br-floating \

--neutron-network-vlan-ranges physnet-tenant:250:700,floating:160:160 \

--neutron-disable-tunneling --timeout 90 \

--verbose --debug --log-file overcloud_new.log

The following are a few parameters that need to be noted:

--control-flavor control --compute-flavor compute --ceph-storage-flavor ceph-storage

ntp server is the server name to be used in the overcloud /etc/ntp.conf file

neutron-network-type is vlan.

neutron-network-vlan-ranges is physnet-tenant:250:700,floating:160:160. Here VLAN ranges from 250 to 700 were reserved for tenants, while VLAN 160 is for floating ip network.

Verbose, debug and log files are self-explanatory.

After successful deployment, the deploy command should show you the following:

DEBUG: os_cloud_config.utils.clients Creating nova client.

overcloud Endpoint: http://172.22.215.91:5000/v2.0/

overcloud Deployed

DEBUG: openstackclient.shell clean_up DeployOvercloud

Write down the endpoint URL to launch the dashboard later. This completes Overcloud deployment.

Debugging Overcloud Failures

Overcloud deployment may fail for several reasons. Either because of a human error, for example, passing incorrect parameters or erroneous yaml configuration files or timeouts or bug. It is beyond the scope of this document to cover all of the possible failures. However, a few scenarios that were encountered on the configuration with explanations are provided in the Troubleshooting section of this document.

Overcloud Post Deployment Steps

To perform the post deployment process, complete the following steps:

Run nova list and login as heat-admin to each host:

[stack@osp8-director ~]$ for i in $(nova list | awk '/ACTIVE/ {print $12}'|cut -d "=" -f2 );

> do

> ssh -l heat-admin -o StrictHostKeyChecking=no $i "touch /tmp/abc; ls -l /tmp/abc"

> done

A command like the one listed above will validate that all the servers are up and running.

Check that the servers are registered with Red Hat Network.

subscription-manager status should reveal the status of this registration.

Query the Ceph pools and tree.

Also check the status of pcs resources as:

pcs resource cleanup

sleep 15

pcs status

pcs status | egrep –i “error|stop”

Overcloud Post-Deployment Configuration

To perform the post-deployment configuration, complete the following steps:

Start fence_cisco_ucs.

Run fence_cisco_ucs and pass the UCSM IP and passwords to it. Openstack_Controller_Node[1,2,3] are the service profile names for the controllers. Replace the string accordingly.

for i in 1 2 3

fence_cisco_ucs --ip=<UCSM IP> --username=admin --password=<password> \

--plug="Openstack_Controller_Node${i}" --suborg="/org-osp8/" --missing-as-off --action=on --ssl-insecure -z;

done

Success: Powered ON

Replace the name of the controller service profile and your org name accordingly.

You can pick this up from General tab of UCS too.

Here the Service Profile name is Openstack_Controller_Node2 and the Sub-Organization is org-osp8

Configuring PaceMaker.

Before proceeding with pacemaker configuration, it is necessary to understand the relationship between the service profile names in UCS with the node names dynamically created by OpenStack as part of Overcloud deployment.

Either login through the Console or extract from /etc/neutron/plugin.ini from any of the controller nodes.

Plugin.ini will be updated by Cisco Plugins that have this information. Open /etc/neutron/plugin.in file and go to the end of the file. Extract the controller syntax.

The following is an example of extraction from plugin.ini file:

ucsm_host_list, could be populated as below in plugin.ini. In case it has appended localdomain, it needs to be removed with the current set of patches, details provided later.

ucsm_host_list=overcloud-compute-2:org-root/org-osp8/ls-Openstack_Compute_Node1, overcloud-compute-3:org-root/org-osp8/ls-Openstack_Compute_Node2, overcloud-compute-0:org-root/org-osp8/ls-Openstack_Compute_Node3, overcloud-compute-1:org-root/org-osp8/ls-Openstack_Compute_Node4, overcloud-controller-2:org-root/org-osp8/ls-Openstack_Controller_Node1, overcloud-controller-1:org-root/org-osp8/ls-Openstack_Controller_Node2, overcloud-controller-0:org-root/org-osp8/ls-Openstack_Controller_Node3

Leave the org-root/<organization-name>. Instead extract just name of the host and the service profile name. There is no need to add organization here because fencing packages take the org-name as input during startup.

overcloud-controller-2:Openstack_Controller_Node1,

overcloud-controller-1:Openstack_Controller_Node2,

overcloud-controller-0:Openstack_Controller_Node3

The mapping is controller-0 is mapped to Service Profile Controller_Node2 and so on. No need to extract the compute hosts as fencing packages run only on controller nodes to form the quorum.

Create a shell script as below with the following information and execute it

#/bin/bash

# Note that ';' as a separator instead of ',' from plugin.ini

sudo pcs stonith create ucs-fence-controller fence_cisco_ucs \

pcmk_host_map="overcloud-controller-1:Openstack_Controller_Node2;overcloud-controller-0:Openstack_Controller_Node3;overcloud-controller-2:Openstack_Controller_Node1" suborg="/org-osp8/" \

ipaddr=<UCSM IP> login=admin passwd=<password> ssl=1 ssl_insecure=1 op monitor interval=60s

sleep 5;

pcs stonith update ucs-fence-controller power_timeout=60

pcs stonith update ucs-fence-controller meta failure-timeout=300s

pcs property set cluster-recheck-interval=300s

sleep 5;

pcs property set cluster-recheck-interval=300s

sudo pcs property set stonith-enabled=true

pcs property set stonith-timeout=300s

pcs resource cleanup

sleep 10;

sudo pcs stonith show ucs-fence-controller

sudo pcs property show

Querying ucs-fence-controller will reveal the mappings created.

[root@overcloud-controller-0 ~]# sudo pcs stonith show ucs-fence-controller

Resource: ucs-fence-controller (class=stonith type=fence_cisco_ucs)

Attributes: pcmk_host_map=overcloud-controller-2:Openstack_Controller_Node1;overcloud-controller-1:Openstack_Controller_Node2;overcloud-controller-0:Openstack_Controller_Node3 suborg=/org-osp8/ ipaddr=10.23.10.5 login=admin passwd=whatever password ssl=1 ssl_insecure=1 power_timeout=60

Meta Attrs: failure-timeout=300s

Operations: monitor interval=60s (ucs-fence-controller-monitor-interval-60s)

Overcloud post deployment fixes for UCSM and Nexus Plugins.

The following two patches are needed for UCSM and Nexus plugins. Download the patches from zipfile packaged in https://communities.cisco.com/docs/DOC-70256

Extract the zip file and copy config.py, mech_cisco_ucsm.py and nexus_network_driver.py from cisco-osp8-cvd/plugin_patches/ into a temporary directory on directory node.

Copy config.py to all the 3 controllers to /usr/lib/python2.7/site-packages/networking_cisco/plugins/ml2/drivers/cisco/ucsm/

Copy mech_cisco_ucsm.py to all the 3 controllers to /usr/lib/python2.7/site-packages/networking_cisco/plugins/ml2/drivers/cisco/ucsm/

Copy nexus_network_driver.py to all the 3 controllers to /usr/lib/python2.7/site-packages/networking_cisco/plugins/ml2/drivers/cisco/nexus/

Check /etc/neutron/plugin.ini file and remove ‘.localdomain’ entries from ucsm_host_list something like below:

ucsm_host_list=overcloud-compute-1:org-root/org-osp8/ls-Openstack_Compute_Node1, overcloud-compute-2:org-root/org-osp8/ls-Openstack_Compute_Node2, overcloud-compute-0:org-root/org-osp8/ls-Openstack_Compute_Node3, overcloud-compute-3:org-root/org-osp8/ls-Openstack_Compute_Node4, overcloud-controller-2:org-root/org-osp8/ls-Openstack_Controller_Node1, overcloud-controller-1:org-root/org-osp8/ls-Openstack_Controller_Node2, overcloud-controller-0:org-root/org-osp8/ls-Openstack_Controller_Node3

Add ucsm_virtio_eth_ports=’Tenant-Internal’ in /etc/neutron/plugin.ini to the UCSM section at the end of the file. Tenant-Internal is the UCS Configured interface for Tenant Traffic.

ucsm_virtio_eth_ports='Tenant-Internal'

Restart neutron

pcs resource restart neutron-server

Please check the readme file in plugin-patches directory of the zip file.

Health Checks

To launch the dashboard URL created after successful installation of Overcloud, complete the following steps:

Go to http://172.22.215.91 (URL provided after Overcloud deployment) and login as admin and use the password created in the overcloudrc file (under $HOME of stack user).

Log into the system and navigate the tabs for any errors.

Update the system defaults.

Functional Validation

Functional Validation includes the following:

· Navigating the dashboard across the admin, project, users tab to spot any issues

· Creating Tenants, Networks, Routers and Instances.

· Create Multiple Tenants, multiple networks and instances within different networks for the same tenant and with additional volumes with the following criteria:

— Successful creation of Instances through CLI and validated through dashboard

— Login to VM from the console.

— Login to VMs through Floating IP’s.

— Reboot VMs

— Check for the VLANs created both in UCSM and also on the Nexus switches. The VLANs should be available globally and also on the both port-channels created on each switch:

conf term

show vlan | grep q-

show running-config interfrace port-channel 17-18

The basic flow of creating and deleting instances through command line and horizon dashboard were tested. Creating multiple tenants and VLAN provisioning across Nexus switches and Cisco UCS Manager were verified while adding and deleting the instances.

For detailed information about validating Overcloud, refer to the Red Hat OpenStack Platform guide.

Upscaling the POD

Scaling up the POD with growing business needs is a must. As business grows we need to add both compute and storage as needed by adding more hosts.

An attempt is made to scale up compute and storage. You may have to follow the steps below with the documented workarounds to add compute and storage nodes to the cluster.

Scale Up Storage Nodes

Provision the New Server in Cisco UCS

To provision the new server in UCS, complete the following steps:

Rack the new C240M4 server(s). There is a single ceph.yaml in the current OpenStack version. Populate the hard disks in these storage servers in the same order as they exist in other servers.

Attach Console and discover the storage server(s) in UCS. Factory reset to defaults if needed and make them UCS managed.

Refer to this section for creating service profiles from Storage template. Create a new service profile from the template. Unbind the template and remove the storage policy that was attached to it earlier and associate the service profile to the server.

Upgrade firmware if needed.

Check the installed firmware on the new node and make sure that it is upgraded to the same version as other storage servers.

Create a new Storage Profile for Disks.

Before creating the storage profile, login to the equipment tab and make sure that all the new storage servers have the disks in place and they are physically on the same slots at par with other storage servers.

Since we used the storage profile earlier with other servers we cannot use them right away. The reason being the luns have to be added to the server in the same way as was done earlier. In case you are discovering more than one storage server at this stage, a single new profile created as below will serve the purpose. While creating this new storage profile, you can reuse the existing disk group configuration policies created earlier.

Go to the service profile of the new server and to the storage tab to create a new storage profile as shown below. Make sure that the local disk config policy is set to No Disk Policy.

Attach this storage profile to the service profile. This will create the first boot lun LUN-0 on the server. Go back to the equipment tab and inventory/storage to check that this is the first Lun is added. This will be the boot lun LUN-0 that will be visible to the server bios. In case of multiple servers being added in this step, attach the new storage profile created above to all these service profiles. This in turn will create LUN-0 in all the nodes.

A subsequent update to this storage profile will be propagated across all these new service profiles.

Go to Storage tab in UCSM and update the storage profile.

Create and attach SSD luns, which will be LUN-1 and LUN-2. Wait few minutes to make sure that all the new servers get these luns in the same order, boot as LUN-0, Journal1 as LUN-1 and Journal2 as LUN-2.

Verify from the equipment tab.

This will be consistent with other servers and we can expect sda for boot lun and sdb and sdc for SSD LUNs being used with the journals.

Add all the HDD LUNs later.

The steps above do not represent the actual boot order. You may have to observe the actual boot order from KVM console to verify.

If the boot disks are being repurposed and are not new, go ahead and re-initialize the boot lun through bios. Boot server, CTR-R, F2 and reinitialize the VD for the boot LUNs.

Get the hardware inventory needed introspection.

Go to the Equipment tab > Inventory > CIMC and get the IPMI address.

Under the same Inventory tab go to NIC subtab and get the pxe mac address of the server. The same inventory should have the CPU and memory details.

Specify the NIC order in the service profile. This should be the same as the other storage servers with provisioning interface as the first one.

Check the boot policy of the server. Validate that this is same as other storage servers. It should be LAN PXE first followed by local LUN.

Run Introspection

To run Introspection, complete the following steps:

Prepare json file for introspection:

[stack@osp8-director ~]$ cat storage-new.json

{

"nodes": [

{

"pm_user": "admin",

"pm_password": "<passwd>",

"pm_type": "pxe_ipmitool",

"pm_addr": "10.23.10.56",

"mac": [

"00:25:b5:00:00:33"

"memory": "131072",

"disk": "250",

"arch": "x86_64",

"cpu": "24"

}

]

}

Check IPMI Connectivity:

[stack@osp8-director ~]$ ipmitool -I lanplus -H 10.23.10.56 -U admin -P <passwd> chassis power off

Chassis Power Control: Down/Off

Initialize Boot Luns; in case you are reusing old disks it is recommended to initialize the boot luns.

Run discovery and introspection.

[stack@osp8-director ~]$ openstack baremetal import --json ~/storage-node.json

openstack baremetal configure boot

ironic node-list

[stack@osp8-director ~]$ ironic node-set-maintenance 948d704b-c82b-4b9a-8d01-ad4899ce725f true

[stack@osp8-director ~]$ openstack baremetal introspection start 948d704b-c82b-4b9a-8d01-ad4899ce725f

[stack@osp8-director ~]$ openstack baremetal introspection status 948d704b-c82b-4b9a-8d01-ad4899ce725f

Repeat the steps above if you want to add multiple nodes.

Wait till the introspection is complete. The status command should yield finished as True and Error as none. Set the maintenance flag as false.

ironic node-set-maintenance 3e72dd8e-c6bd-4bd9-a252-64c20e3c1d33 False

Update node properties:

[stack@osp8-director ~]$ ironic node-update 948d704b-c82b-4b9a-8d01-ad4899ce725f \

> add properties/capabilities='profile:ceph-storage,boot_option:local'

There are 4 ceph-storage nodes now.

Run Overcloud Deployment

The number of storage nodes has been incremented to 4 from 3. Here the number ‘4’ indicates the total number of storage nodes in Overcloud.

#!/bin/bash

openstack overcloud deploy --templates --ceph-storage-scale 4 \

-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \

-e /home/stack/templates/network-environment.yaml \

-e /home/stack/templates/network-management.yaml \

-e /home/stack/templates/storage-environment.yaml \

-e /home/stack/templates/timezone.yaml \

-e /home/stack/templates/cisco-plugins.yaml \

--log-file overcloud_storage-add.log

During the addition of nodes, Ceph health observed to be fine.

Heat Resource shows that the node is being added.

Addition of nodes complete with the following message:

2016-09-19 00:11:09 [overcloud-CephStorageNodesPostDeployment-khpaqsy6woeh]: UPDATE_COMPLETE Stack UPDATE completed successfully

2016-09-19 00:11:09 [CephStorageNodesPostDeployment]: UPDATE_COMPLETE state changed

Stack overcloud UPDATE_COMPLETE

Overcloud Endpoint: http://172.22.215.16:5000/v2.0

Overcloud Deployed

Post Deployment Health Checks

To perform the post-deployment health checks, complete the following steps:

Check with ironic and nova commands, the existence of the new node.

Check status of Ceph cluster.

This completes the addition of storage node in the cluster.

Scale Up Compute Nodes

Provision the New Blade Server in Cisco UCS

Insert the new Cisco UCS B200 M4 blade server into an empty slot in the chassis with similar configuration of local disks.

Refer to this section above for creating service profiles from Compute template. Create a new service profile from the template. Unbind the template and remove the storage policy that was attached to it earlier and associate the service profile to the server.

Upgrade firmware if needed.

Check the installed firmware on the new node and make sure that it is upgraded to the same version as other compute nodes.

Get the hardware inventory details needed for introspection. This include IPMI address, Provisioning MAC address, Boot Lun size, CPU and memory.

Collect the hardware inventory to create the json file for introspection.

Check the boot policy of the server. Validate that this is same as other compute nodes too. Should be LAN PXE first followed by local lun.

Update cisco-plugins.yaml file with details about this new server. Update the Tenant NIC address as below.

Append an entry in UCSM Host list and Nexus Switches entries in Cisco Plugins as below

00:25:b5:00:00:29:org-root/org-osp8/ls-Openstack_Compute_Node4,

"00:25:b5:00:00:29": {

"ports": "port-channel:17,port-channel:18"

}

Make sure that this Service Profile is not bound to its template and check the order of NICs as below.

Check the status of the boot lun and make sure that the local disk config policy is no-disk policy.

With the above the server is ready for introspection and Overcloud deploy.

Run Introspection

To run Introspection, complete the following steps:

Prepare json file for introspection:

[stack@osp8-director ~]$ cat compute-new.json

{

"nodes": [

{

"pm_user": "admin",

"pm_password": "<passwd>",

"pm_type": "pxe_ipmitool",

"pm_addr": "10.23.10.78",

"mac": [

"00:25:b5:00:00:2d"

"memory": "262144",

"disk": "250",

"arch": "x86_64",

"cpu": "40"

}

]

}

Check IPMI Connectivity:

[stack@osp8-director ~]$ ipmitool -I lanplus -H 10.23.10.78 -U admin -P <passwd> chassis power off

Chassis Power Control: Down/Off

[stack@osp8-director ~]$ Run discovery and introspection:

[stack@osp8-director ~]$ openstack baremetal import --json ~/compute-new.json

[stack@osp8-director ~]$ openstack baremetal configure boot

[stack@osp8-director ~]$ ironic node-list

[stack@osp8-director ~]$ ironic node-set-maintenance ddb9093d-4ef8-4d24-81fd-f6ddc29900e1 true

[stack@osp8-director ~]$ openstack baremetal introspection start ddb9093d-4ef8-4d24-81fd-f6ddc29900e1

[stack@osp8-director ~]$ openstack baremetal introspection status ddb9093d-4ef8-4d24-81fd-f6ddc29900e1

+----------+-------+

| Field | Value |

+----------+-------+

| error | None |

| finished | True |

+----------+-------+

Wait till the introspection is complete. The status command should yield finished as True and Error as none. Alternatively open a KVM console to observe the status of introspection.

[stack@osp8-director ~]$ ironic node-set-maintenance ddb9093d-4ef8-4d24-81fd-f6ddc29900e1 false

Update node properties

[stack@osp8-director ~]$ ironic node-update ddb9093d-4ef8-4d24-81fd-f6ddc29900e1 \

> add properties/capabilities='profile:compute,boot_option:local'

Check the status of added entries with ironic node-show. Repeat the above to all nodes that you would like to add as overcloud deploy can add all of these in a single go.

Run Overcloud Deploy

Run the Overcloud deployment command. The number of compute nodes has been incremented to 4 from 3 earlier. Here the number ‘4’ indicates the total number of storage nodes in Overcloud.

[stack@osp8-director ~]$ cat run_compute.sh

#!/bin/bash

openstack overcloud deploy --templates --compute-scale 4 \

-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \

-e /home/stack/templates/network-environment.yaml \

-e /home/stack/templates/network-management.yaml \

-e /home/stack/templates/storage-environment.yaml \

-e /home/stack/templates/timezone.yaml \

-e /home/stack/templates/cisco-plugins.yaml \

--log-file overcloud_compute.log

2016-09-19 18:40:56 [overcloud]: UPDATE_COMPLETE Stack UPDATE completed successfully

Stack overcloud UPDATE_COMPLETE

Overcloud Endpoint: http://172.22.215.16:5000/v2.0

Overcloud Deployed

Post Deployment and Health Checks

To perform the deployment and health checks, complete the following steps:

Login to each controller node and check for the existence of the new compute node in /etc/neutron/plugin.ini. If not please add in each Nexus Switch section and also in UCSM host list in plugin.ini file. Make sure to make the changes across all the controller nodes.

[root@overcloud-controller-0 ~]# grep compute-3 /etc/neutron/plugin.ini

overcloud-compute-3.localdomain=port-channel:17,port-channel:18

ucsm_host_list=overcloud-compute-3.localdomain:org-root/org-osp8/ls-Openstack_Compute_Node4,…

Restart Neutron

pcs resource restart neutron-server

Restart nova-services as a post deployment.

pcs resource restart openstack-nova-scheduler

pcs resource restart openstack-nova-consoleauth

pcs resource restart openstack-nova-api

Check the status of PCS cluster and restart if needed with pcs resource cleanup.

Check the status through ironic node-list and nova list.

Check with nova service-list after sourcing overcloudrc.

Log into dashboard to check the status of the new node added.

The System should be up and running and will deploy VMs on the newly added node.

Create few VMs to make sure that the newly added compute host gets few VMs and the plugins are working fine.

High Availability

Both the hardware and software stack are injected with faults to trigger a failure of a running process on a node or an unavailability of hardware for a short or extended period of time. With the fault in place the functional validations are done as mentioned above. The purpose is to achieve business continuity without interruption to the clients. However, performance degradation is inevitable and has been documented wherever it was captured as part of the tests.

High Availability of Software Stack

OpenStack Services

The status of OpenStack services were checked with pcs status as below on Controller Node:

[root@overcloud-controller-0 ~]# pcs status

Cluster name: tripleo_cluster

Last updated: Mon Sep 19 06:37:03 2016 Last change: Sun Sep 18 16:56:59 2016 by root via cibadmin on overcloud-controller-0

Stack: corosync

Current DC: overcloud-controller-2 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum

3 nodes and 113 resources configured

Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Full list of resources:

ip-10.23.110.75 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0

Clone Set: haproxy-clone [haproxy]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

ip-10.23.120.50 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1

ip-10.23.150.50 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2

ip-10.23.100.51 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0

ip-172.22.215.16 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1

Master/Slave Set: redis-master [redis]

Masters: [ overcloud-controller-2 ]

Slaves: [ overcloud-controller-0 overcloud-controller-1 ]

Master/Slave Set: galera-master [galera]

Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: mongod-clone [mongod]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: rabbitmq-clone [rabbitmq]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: memcached-clone [memcached]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

ip-10.23.100.50 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2

Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: neutron-l3-agent-clone [neutron-l3-agent]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-heat-engine-clone [openstack-heat-engine]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-heat-api-clone [openstack-heat-api]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-nova-api-clone [openstack-nova-api]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-glance-api-clone [openstack-glance-api]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: delay-clone [delay]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: httpd-clone [httpd]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-keystone-clone [openstack-keystone]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-glance-registry-clone [openstack-glance-registry]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-cinder-api-clone [openstack-cinder-api]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: neutron-server-clone [neutron-server]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-0

Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

ucs-fence-controller (stonith:fence_cisco_ucs): Started overcloud-controller-1

PCSD Status:

overcloud-controller-0: Online

overcloud-controller-1: Online

overcloud-controller-2: Online

Daemon Status:

corosync: active/enabled

pacemaker: active/enabled

pcsd: active/enabled

Few identified services running on these nodes were either restarted or killed and/or rebooted the nodes.

For eg.

Master/Slave Set: redis-master [redis]

Masters: [ overcloud-controller-2 ]

Slaves: [ overcloud-controller-0 overcloud-controller-1 ]

Per above redis master is overcloud-controller-2. This node was rebooted and observed the behavior while the node getting rebooted and any impact on VMs.

The Ceph node monitors and services were also restarted to test any interruption of volume creation and booting of the VMs, but no issues were observed.

High Availability of Hardware Stack

HA of Fabric Interconnects

FI Reboot Tests

Cisco UCS Fabric Interconnects work in pair with inbuilt HA. While both of them serve traffic during a normal operation, a surviving member can still keep the system up and running. Depending on the overprovisioning used in the deployment a degradation in performance may be expected.

An effort is made to reboot the Fabric one after the other and do functional tests as mentioned earlier.

· Check the status of FI’s

Check the status of the UCS Fabric Cluster before reboot

UCS-OSP8-FAB-B# show cluster extended-state

Cluster Id: 0x3bbf9944066711e5-0xa8888c604f640804

Start time: Wed May 18 08:38:27 2016

Last election time: Wed May 18 09:13:24 2016

B: UP, PRIMARY

A: UP, SUBORDINATE

B: memb state UP, lead state PRIMARY, mgmt services state: UP

A: memb state UP, lead state SUBORDINATE, mgmt services state: UP

heartbeat state PRIMARY_OK

INTERNAL NETWORK INTERFACES:

eth1, UP

eth2, UP

HA READY ß--System should be in HA ready before invoking any of the HA tests on Fabrics.

Detailed state of the device selected for HA storage:

Chassis 1, serial: FOX1832G67B, state: active

Chassis 2, serial: FOX1831G2L5, state: active

Server 2, serial: FCH1913V0VJ, state: active

· Check the status of OpenStack PCS Cluster before reboot

Current DC: overcloud-controller-2 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum

3 nodes and 113 resources configured

Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Full list of resources:

ip-10.23.110.75 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0

Clone Set: haproxy-clone [haproxy]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

ip-10.23.120.50 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1

ip-10.23.150.50 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2

ip-10.23.100.51 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0

ip-172.22.215.16 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1

· Check the status of VMs.

· Run script to login to all the VMs.

[stack@osp8-director scripts]$ date;./tenantips.sh;date

Tue Aug 23 19:38:42 PDT 2016

inet 10.20.191.5 netmask 255.255.255.0 broadcast 10.20.191.255

inet 10.20.191.4 netmask 255.255.255.0 broadcast 10.20.191.255

· Reboot Fabric B ( primary )

Log into UCS Fabric Command Line Interface and reboot the Fabric

Connect Local Management

UCS-OSP8-FAB-B(local-mgmt)# reboot

Before rebooting, please take a configuration backup.

Do you still want to reboot? (yes/no):yes

nohup: ignoring input and appending output to `nohup.out'

Broadcast message from root (Tue Aug 23 19:13:21 2016):

All shells being terminated due to system /sbin/reboot

Connection to 10.23.10.7 closed

Health Checks and Observations

The following is a list of health checks and observations:

· Check for VIP and Fabric A pings. Both should be down immediately. VIP recovers after a couple of minutes

show UCS-OSP8-FAB-A# show cluster extended-state

Cluster Id: 0x3bbf9944066711e5-0xa8888c604f640804

Start time: Wed May 18 09:12:15 2016

Last election time: Tue Aug 23 19:15:51 2016

A: UP, PRIMARY, (Management services: INIT IN PROGRESS)

B: DOWN, INAPPLICABLE

A: memb state UP, lead state PRIMARY, mgmt services state: INVALID

B: memb state DOWN, lead state INAPPLICABLE, mgmt services state: DOWN

heartbeat state SECONDARY_FAILED

INTERNAL NETWORK INTERFACES:

eth1, UP

eth2, UP

HA NOT READY

Management services: initialization in progress on local Fabric Interconnect

Detailed state of the device selected for HA storage:

Chassis 1, serial: FOX1832G67B, state: active

Chassis 2, serial: FOX1831G2L5, state: active

Server 2, serial: FCH1913V0VJ, state: active

· Login to dashboard. System could be slower, but works fine.

· Check for PCS Cluster status on one of the controller nodes. System could be slow in the beginning but should respond as follows:

PCSD Status:

overcloud-controller-0: Online

overcloud-controller-1: Online

overcloud-controller-2: Online

Perform a quick health check on creating VMs, Check the sanity checks on Nexus switches too for any impact on port-channels because of Fab B is down.

· Create Virtual Machines

Perform a quick health check on creating VMs and checking the status of existing instances.

Fabric B might take around 15 minutes to come back online.

Reboot Fabric A

· Connect to the Fabric A now and check the cluster status. System should show HA READY before rebooting Fab A.

· Reboot Fab A by connecting to the local-mgmt similar to Fab B above.

· Perform the health check similar to the ones done on Fab B earlier.

· The test went fine with fence_cisco_ucs package ( patched ) in place.

Hardware Failures of IO Modules

IO Module Failures seldom happen in UCS infrastructure and in most of the cases these are human mistakes. The failure tests were included just to validate the business continuity.

Multiple Tenants with multiple networks and Virtual machines were created. Identified the VMs belonging to the same tenant but with different networks and also going to different chassis. One of the IO Modules was pulled out from the chassis and the L3 traffic validated.

HA on Cisco Nexus Switches

Cisco Nexus switches are deployed in pairs and allow the upstream connectivity of the virtual machines to outside of the fabric. Cisco Nexus plugin creates VLANs on these switches both globally and on the port channel. The Nexus plugin replays these vlans or rebuilds the vlan information on the rebooted switch once it comes back up again. In order to test the HA of these switches multiple networks and instances were created and one of the switches were rebooted. The connectivity of the VMs through floating network checked and also the time it took for the plugins to replay was noted as below.

Test Bed Setup before Injecting Fault

Nexus Switches

Software

BIOS: version 07.17

NXOS: version 7.0(3)I1(3)

BIOS compile time: 09/10/2014

NXOS image file is: bootflash:///n9000-dk9.7.0.3.I1.3.bin

NXOS compile time: 8/21/2015 3:00:00 [08/21/2015 10:27:18]

Hardware

cisco Nexus9000 C9372PX chassis

Intel(R) Core(TM) i3-3227U C with 16402540 kB of memory.

Processor Board ID SAL1913CBFP

Device name: OSP8-N9K-FAB-B

bootflash: 51496280 kB

Kernel uptime is 43 day(s), 19 hour(s), 18 minute(s), 29 second(s)

Last reset at 921588 usecs after Tue Jun 21 23:31:14 2016

Reason: Reset due to upgrade

System version: 6.1(2)I3(2)

Service:

N9K-A

interface port-channel17

description OSP8-FAB-A

switchport mode trunk

switchport trunk allowed vlan 1,10,100,110,120,130,150,160,215

switchport trunk allowed vlan add 255,257-258,262,264,266,268,271

………………………………….

interface port-channel18

description OSP8-FAB-A

switchport mode trunk

switchport trunk allowed vlan 1,10,100,110,120,130,150,160,215

switchport trunk allowed vlan add 255,257-258,262,264,266,268,271

…………………………………..

N9K-B

interface port-channel17

description OSP8-FAB-B

switchport mode trunk

switchport trunk allowed vlan 1,10,100,110,120,130,150,160,215

switchport trunk allowed vlan add 254-255,257-258,262,264,266,268

…………………….

interface port-channel18

description OSP8-FAB-B

switchport mode trunk

switchport trunk allowed vlan 1,10,100,110,120,130,150,160,215

switchport trunk allowed vlan add 254-255,257-258,262,264,266,268

………………………

Rebooted switch. The switch came up fine and the port-channel entries remained intact. The connectivity of the VMs went fine too.

Switch came up by Thu Aug 4 12:11:14 PDT 2016

Kernel uptime is 0 day(s), 0 hour(s), 2 minute(s), 1 second(s)

Last reset at 692395 usecs after Thu Aug 4 18:53:23 2016

Repeated again on the other switch.

Figure 13 Cisco UCS Manager VLANs

Creating Virtual Machines

Creating tenant creates VLANs in compute nodes. However if a VM from one tenant is deleted, the VLAN on the computes will remain until the last vm of that tenant is deleted.

Tenants

Delete one network from one tenant.

Delete tenant310 with network 160.

The segmentation id is 323.

Delete instances tenant310_160_inst3 and tenant310_160_inst4

The router, network entries remain.

The global vlan remains

The nexus vlan remains

OSP8-N9K-FAB-A(config)# show vlan id 323

VLAN Name Status Ports

---- -------------------------------- --------- -------------------------------

323 VLAN0323 active Po17, Po18, Eth1/17, Eth1/18

Connectivity Tests

Connectivity from the external client machine on floating IP to VMs.

Command used:

ssh -i tenant349kp.pem -o StrictHostKeyChecking=no cloud-user@10.23.160.77 /tmp/run.sh – for each VM created.

Host is tenant208-108-inst1 and Network is inet 10.1.108.5 netmask 255.255.255.0 broadcast 10.1.108.255

Host is tenant208-108-inst2 and Network is inet 10.1.108.6 netmask 255.255.255.0 broadcast 10.1.108.255

…….

Host is tenant348-148-inst1 and Network is inet 10.2.148.5 netmask 255.255.255.0 broadcast 10.2.148.255

Host is tenant348-148-inst2 and Network is inet 10.2.148.6 netmask 255.255.255.0 broadcast 10.2.148.255

Host is tenant348-198-inst3 and Network is inet 10.2.198.5 netmask 255.255.255.0 broadcast 10.2.198.255

A script was created and pushed with ‘scp’ that in turn runs ifconfig on each VM and gets the details. This was validated for the VMs created above.

HA on Controller Blades

Controllers are key for the health of the cloud which hosts most of the OpenStack services. There are three types of controller failures that could happen.

Server reboot, pulling the blade out of the chassis while system is up and running and putting it back, pulling the blade from the chassis and replacing it simulating a total failure of the controller node.

Server Reboot Tests

Run Health check before to make sure that system is healthy.

· Run nova list after sourcing stackrc as stack user on Undercloud node to verify that all the controllers are in healthy state as below

· Run pcs status on controller nodes and grep for error or stopped.

[root@overcloud-controller-0 ~]# pcs status

Cluster name: tripleo_cluster

Last updated: Tue Sep 20 11:19:36 2016 Last change: Mon Sep 19 20:47:46 2016 by root via cibadmin on overcloud-controller-0

Stack: corosync

Current DC: overcloud-controller-1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum

3 nodes and 113 resources configured

Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Full list of resources:

ip-10.23.110.59 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0

Clone Set: haproxy-clone [haproxy]

Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

ip-10.23.120.50 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1

ip-10.23.150.50 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2

ip-10.23.100.51 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0

ip-172.22.215.16 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1

Master/Slave Set: redis-master [redis]

Masters: [ overcloud-controller-1 ]

……

PCSD Status:

overcloud-controller-0: Online

overcloud-controller-1: Online

overcloud-controller-2: Online

Daemon Status:

corosync: active/enabled

pacemaker: active/enabled

pcsd: active/enabled

· Reboot the first controller node and check for pcs status and connectivity of the VMs.

· When the controller comes up, wait till all the services are through PCS are up and running.

· Repeat reboot of the second node and then the third node after the second comes up fully.

Health Checks and Observations

The following is a list of health checks and observations:

· Do not reboot the second controller unless the prior one comes up first. Check pacemaker status, health of quorum ( corosync ), health of dashboard.

· Prior to reboot, check the connectivity of VMs

[stack@osp8-director scripts]$ date; ./tenantips.sh >> /dev/null; date;

Tue Sep 20 11:26:18 PDT 2016

Tue Sep 20 11:26:22 PDT 2016

[stack@osp8-director scripts]$

Took around 4 seconds to login, execute ifconfig and logout from all the 20VMs.

· Two controllers are minimum needed for healthy operation.

· While the first node is booting up, it takes time for pcs status command to complete.

PCS will report one server is offline.

PCSD Status:

overcloud-controller-0: Online

overcloud-controller-1: Offline

overcloud-controller-2: Online

Daemon Status:

corosync: active/enabled

pacemaker: active/enabled

pcsd: active/enabled

Corosync will return that it gets only 2 votes out of 3 as below while the server is getting rebooted. This is normal.

· Ceph reports that one monitor is down

[root@overcloud-controller-0 cluster]# ceph -s

cluster e1fa36c0-7ed9-11e6-90fa-0025b5000000

health HEALTH_WARN

1 mons down, quorum 1,2 overcloud-controller-0,overcloud-controller-2

monmap e2: 3 mons at {overcloud-controller-0=10.23.120.52:6789/0,overcloud-controller-1=10.23.120.51:6789/0,overcloud-controller-2=10.23.120.61:6789/0}

election epoch 8, quorum 1,2 overcloud-controller-0,overcloud-controller-2

osdmap e105: 24 osds: 24 up, 24 in

pgmap v15522: 448 pgs, 4 pools, 22936 MB data, 5865 objects

69496 MB used, 128 TB / 128 TB avail

448 active+clean

· When the node comes up, the routers remain on the other 2 controllers and do not fall back. Can be queried with ip netns too.

· The login to VMs is observed to be slow.

· If controller node does not come up, check through KVM console to spot out any issues and hold off rebooting the second node before a healthy reboot operation of the first.

Blade Pull Tests

One of the controller nodes blade was pulled out while the system is up and running. The validation tests like VM creation etc were done prior to the tests and to check the status when the blade is pulled from the chassis. This is like simulating a complete blade failure. After around 60 minutes the blade was re-inserted back in the chassis.

Health Checks and Observations

The same behavior as observed during reboot were noticed during the blade pull tests. However unlike a reboot which completes in 5-10 minutes, this was for an extended period of time of 60 minutes to check the status of the cluster.

· Cisco UCS marks the blade as ‘removed’ and prompts to resolve the slot issue.

· Ironic gives up as it cannot bring the server back online and enables Maintenance mode to True for this node.

· Ceph storage will report that 1 out of 3 monitors are down similar to above. All the 3 controllers will be running one monitor each. However all the OSD’s are up and running.

· After inserting the blade back into the same slot of the chassis, it needed a manual intervention to correct the above.

— Insert the blade back into the slot and resolve the slot issue in UCS.

— ironic node-set-power-state 2804800a-a8cb-4170-8015-0bae8163661c on

— ironic node-set-maintenance 2804800a-a8cb-4170-8015-0bae8163661c false

— Wait for a minute and check back for these columns with ironic node-list.

— You may observe that nova service-list is down on the new node. You may have to wait few minutes before it comes up.

[root@overcloud-controller-0 ~]# nova service-list | grep overcloud-controller-1

— Login to the controller node, check for pcs status and resolve any processes that were not brought up running ‘pcs resource cleanup’

— Ocassional issue as reported in bug 1368594 is observed.

Blade Replacement

Unlike the above two types of failures, in this test the blade is completely removed and new one is added. There were few issues encountered while rebuilding the failed controller blade and adding it as a replacement. The fix for bug 1298430 will give business continuity but there is a need to fix the failed blade. While this issue is being investigated, an interim solution was developed to circumvent the above limitation. This is included in the Hardware failures section. Different types of hardware failures that can happen on a controller blade and how to mitigate these issues considering the dependency of Controller blade on IPMI and MAC addresses is addressed there.

HA on Compute Blades

Reboot Test

Tests and Observations are as follows:

· Many Instances were provisioned across the pod and reboot of the Compute Node was attempted.

[root@overcloud-compute-3 ~]# virsh list

Id Name State

----------------------------------------------------

2 instance-0000000f running

3 instance-0000001b running

4 instance-00000021 running

5 instance-00000030 running

6 instance-0000003c running

7 instance-00000048 running

· Identified a compute host to be rebooted and the VMs that could be impacted

About 6 VMs were up and running

· Identify the floating IP’s for these VMs from nova list –-all-tenants and capture data to login without password, run ifconfig script. The script sshs to all the VMs run’s ifconfig and returns serially.

Running a script N-S for all the VMs

[stack@osp8-director scripts]$ date;./tenantips.sh ; date

Mon Sep 19 14:26:55 PDT 2016

inet 10.20.155.4 netmask 255.255.255.0 broadcast 10.20.155.255

inet 10.20.155.3 netmask 255.255.255.0 broadcast 10.20.155.255

inet 10.20.105.4 netmask 255.255.255.0 broadcast 10.20.105.255

inet 10.20.105.3 netmask 255.255.255.0 broadcast 10.20.105.255

inet 10.20.154.4 netmask 255.255.255.0 broadcast 10.20.154.255

inet 10.20.154.5 netmask 255.255.255.0 broadcast 10.20.154.255

inet 10.20.104.4 netmask 255.255.255.0 broadcast 10.20.104.255

inet 10.20.104.3 netmask 255.255.255.0 broadcast 10.20.104.255

inet 10.20.153.4 netmask 255.255.255.0 broadcast 10.20.153.255

inet 10.20.153.3 netmask 255.255.255.0 broadcast 10.20.153.255

inet 10.20.103.4 netmask 255.255.255.0 broadcast 10.20.103.255

inet 10.20.103.3 netmask 255.255.255.0 broadcast 10.20.103.255

inet 10.20.152.4 netmask 255.255.255.0 broadcast 10.20.152.255

inet 10.20.152.5 netmask 255.255.255.0 broadcast 10.20.152.255

inet 10.20.102.4 netmask 255.255.255.0 broadcast 10.20.102.255

inet 10.20.102.3 netmask 255.255.255.0 broadcast 10.20.102.255

inet 10.20.151.4 netmask 255.255.255.0 broadcast 10.20.151.255

inet 10.20.151.3 netmask 255.255.255.0 broadcast 10.20.151.255

inet 10.20.101.4 netmask 255.255.255.0 broadcast 10.20.101.255

inet 10.20.101.3 netmask 255.255.255.0 broadcast 10.20.101.255

Mon Sep 19 14:26:59 PDT 2016

Took around 3 seconds to login to all the VMs.

The tenantips.sh does something like below:

[stack@osp8-director scripts]$ tail -3 tenantips.sh

ssh -i tenant301kp.pem -o StrictHostKeyChecking=no cloud-user@10.23.160.33 /usr/sbin/ifconfig | grep "10.20"

ssh -i tenant301kp.pem -o StrictHostKeyChecking=no cloud-user@10.23.160.32 /usr/sbin/ifconfig | grep "10.20"

ssh -i tenant301kp.pem -o StrictHostKeyChecking=no cloud-user@10.23.160.31 /usr/sbin/ifconfig | grep "10.20"

set resume_guests_state_on_host_boot=true in nova.conf to get the instances back online after reboot.

· Rebooted the Compute Node overcloud-compute-3

· Instances came up fine and the same script to validate the login with floating ips worked fine.

· By default guests will not come up unless resume_guests_state_on_host_boot is set to true. If this parameter isn’t set before reboot:

· Now Reboot the node and check connectivity of VMs during and after reboot.

[root@overcloud-compute-3 ~]# reboot

PolicyKit daemon disconnected from the bus.

We are no longer a registered authentication agent.

Connection to 10.23.110.57 closed by remote host.

Connection to 10.23.110.57 closed.

[stack@osp8-director ~]$

Using the same script while server is rebooting yields connectivity failures to the 6 VMs on this host.

[stack@osp8-director scripts]$ date;./tenantips.sh ; date

Mon Sep 19 14:32:37 PDT 2016

ssh: connect to host 10.23.160.54 port 22: No route to host

inet 10.20.155.3 netmask 255.255.255.0 broadcast 10.20.155.255

inet 10.20.105.4 netmask 255.255.255.0 broadcast 10.20.105.255

inet 10.20.105.3 netmask 255.255.255.0 broadcast 10.20.105.255

ssh: connect to host 10.23.160.49 port 22: No route to host

inet 10.20.154.5 netmask 255.255.255.0 broadcast 10.20.154.255

inet 10.20.104.4 netmask 255.255.255.0 broadcast 10.20.104.255

inet 10.20.104.3 netmask 255.255.255.0 broadcast 10.20.104.255

ssh: connect to host 10.23.160.44 port 22: No route to host

inet 10.20.153.3 netmask 255.255.255.0 broadcast 10.20.153.255

inet 10.20.103.4 netmask 255.255.255.0 broadcast 10.20.103.255

inet 10.20.103.3 netmask 255.255.255.0 broadcast 10.20.103.255

inet 10.20.152.4 netmask 255.255.255.0 broadcast 10.20.152.255

ssh: connect to host 10.23.160.38 port 22: No route to host

inet 10.20.102.4 netmask 255.255.255.0 broadcast 10.20.102.255

ssh: connect to host 10.23.160.36 port 22: No route to host

inet 10.20.151.4 netmask 255.255.255.0 broadcast 10.20.151.255

inet 10.20.151.3 netmask 255.255.255.0 broadcast 10.20.151.255

inet 10.20.101.4 netmask 255.255.255.0 broadcast 10.20.101.255

ssh: connect to host 10.23.160.31 port 22: No route to host

Mon Sep 19 14:34:27 PDT 2016

Once the node comes up fine, the VMs are accessible.

[stack@osp8-director ~]$ ssh -l heat-admin 10.23.110.57

Last login: Mon Sep 19 14:36:08 2016 from 10.23.110.26

[heat-admin@overcloud-compute-3 ~]$ sudo -i

[root@overcloud-compute-3 ~]# date

Mon Sep 19 14:36:49 PDT 2016

[root@overcloud-compute-3 ~]# virsh list

Id Name State

----------------------------------------------------

2 instance-0000000f running

3 instance-0000001b running

4 instance-00000021 running

5 instance-00000030 running

6 instance-0000003c running

7 instance-00000048 running

All the VMs are accessible from floating IP’s too now.

[stack@osp8-director scripts]$ date;./tenantips.sh ; date

Mon Sep 19 14:37:05 PDT 2016