Cisco UCS Integrated Infrastructure for Big Data and Analytics with Hortonworks Data Platform and Hortonworks DataFlow

Available Languages

Download Options

PDF (11.6 MB)
View with Adobe Reader on a variety of devices
ePub (13.8 MB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (8.2 MB)
View on Kindle device or Kindle app on multiple devices

Updated:July 28, 2016

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Cisco UCS Integrated Infrastructure for Big Data and Analytics with Hortonworks Data Platform and Hortonworks DataFlow

Building a 64 Node Hadoop Cluster

Last Updated: August 1, 2016

Cisco Validated Design

About Cisco Validated Designs

The CVD program consists of systems and solutions designed, tested, and documented to facilitate faster, more reliable, and more predictable customer deployments. For more information visit

http://www.cisco.com/go/designzone.

ALL DESIGNS, SPECIFICATIONS, STATEMENTS, INFORMATION, AND RECOMMENDATIONS (COLLECTIVELY, "DESIGNS") IN THIS MANUAL ARE PRESENTED "AS IS," WITH ALL FAULTS. CISCO AND ITS SUPPLIERS DISCLAIM ALL WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OR ARISING FROM A COURSE OF DEALING, USAGE, OR TRADE PRACTICE. IN NO EVENT SHALL CISCO OR ITS SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, OR INCIDENTAL DAMAGES, INCLUDING, WITHOUT LIMITATION, LOST PROFITS OR LOSS OR DAMAGE TO DATA ARISING OUT OF THE USE OR INABILITY TO USE THE DESIGNS, EVEN IF CISCO OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

THE DESIGNS ARE SUBJECT TO CHANGE WITHOUT NOTICE. USERS ARE SOLELY RESPONSIBLE FOR THEIR APPLICATION OF THE DESIGNS. THE DESIGNS DO NOT CONSTITUTE THE TECHNICAL OR OTHER PROFESSIONAL ADVICE OF CISCO, ITS SUPPLIERS OR PARTNERS. USERS SHOULD CONSULT THEIR OWN TECHNICAL ADVISORS BEFORE IMPLEMENTING THE DESIGNS. RESULTS MAY VARY DEPENDING ON FACTORS NOT TESTED BY CISCO.

CCDE, CCENT, Cisco Eos, Cisco Lumin, Cisco Nexus, Cisco StadiumVision, Cisco TelePresence, Cisco WebEx, the Cisco logo, DCE, and Welcome to the Human Network are trademarks; Changing the Way We Work, Live, Play, and Learn and Cisco Store are service marks; and Access Registrar, Aironet, AsyncOS, Bringing the Meeting To You, Catalyst, CCDA, CCDP, CCIE, CCIP, CCNA, CCNP, CCSP, CCVP, Cisco, the Cisco Certified Internetwork Expert logo, Cisco IOS, Cisco Press, Cisco Systems, Cisco Systems Capital, the Cisco Systems logo, Cisco Unity, Collaboration Without Limitation, EtherFast, EtherSwitch, Event Center, Fast Step, Follow Me Browsing, FormShare, GigaDrive, HomeLink, Internet Quotient, IOS, iPhone, iQuick Study, IronPort, the IronPort logo, LightStream, Linksys, MediaTone, MeetingPlace, MeetingPlace Chime Sound, MGX, Networkers, Networking Academy, Network Registrar, PCNow, PIX, PowerPanels, ProConnect, ScriptShare, SenderBase, SMARTnet, Spectrum Expert, StackWise, The Fastest Way to Increase Your Internet Quotient, TransPath, WebEx, and the WebEx logo are registered trademarks of Cisco Systems, Inc. and/or its affiliates in the United States and certain other countries.

All other trademarks mentioned in this document or website are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (0809R)

Table of Contents

About Cisco Validated Designs. 2

Executive Summary. 8

Solution Overview.. 9

Lambda Architecture: Merging Batch and Real-time Data. 11

HDP Reference Architecture. 13

HDF Reference Architecture. 14

Technology Overview.. 16

Cisco UCS Integrated Infrastructure for Big Data and Analytics with HDP and HDF. 16

Cisco UCS 6200 Series Fabric Interconnects. 16

Cisco UCS 6300 Series Fabric Interconnects. 16

Cisco UCS C-Series Rack Mount Servers. 17

Cisco UCS Virtual Interface Cards (VICs) 17

Cisco UCS Manager 18

Hortonworks (HDP 2.4.2) 19

HDP for Data Access. 19

HDP Cluster Operations. 20

HDF. 20

Benefits of Hortonworks DataFlow.. 21

Features of Hortonworks DataFlow.. 22

Rack and PDU Configuration. 27

Port Configuration on Fabric Interconnects. 29

Server Configuration and Cabling for Cisco UCS C-Series M4. 29

Software Distributions and Versions. 30

Hortonworks Data Platform (HDP 2.4) 30

Red Hat Enterprise Linux (RHEL) 30

Software Versions. 31

Fabric Configuration. 31

Performing Initial Setup of Cisco UCS 6296 Fabric Interconnects. 32

Configure Fabric Interconnect A. 32

Configure Fabric Interconnect B. 33

Logging Into Cisco UCS Manager 33

Upgrading UCSM Software to Version 3.1(1g) 33

Adding a Block of IP Addresses for KVM Access. 33

Enabling Uplink Ports. 34

Configuring VLANs. 35

Enabling Server Ports. 37

Creating Pools for Service Profile Templates. 39

Creating an Organization. 39

Creating MAC Address Pools. 39

Creating a Server Pool 41

Creating Policies for Service Profile Templates. 43

Creating Host Firmware Package Policy. 43

Creating QoS Policies. 44

Creating the Local Disk Configuration Policy. 47

Creating Server BIOS Policy. 48

Creating the Boot Policy. 50

Creating Power Control Policy. 52

Creating a Service Profile Template. 54

Configuring the Storage Provisioning for the Template. 55

Configuring Network Settings for the Template. 56

Configuring the vMedia Policy for the Template. 62

Configuring Server Boot Order for the Template. 63

Configuring Server Assignment for the Template. 65

Configuring Operational Policies for the Template. 66

Installing Red Hat Enterprise Linux 7.2. 69

Post OS Install Configuration. 93

Setting Up Password-less Login. 93

Configuring /etc/hosts. 94

Creating a Red Hat Enterprise Linux (RHEL) 7.2 Local Repo. 95

Creating the Red Hat Repository Database. 97

Setting up ClusterShell 97

Installing httpd. 99

Set Up all Nodes to use the RHEL Repository. 99

Configuring DNS. 100

Upgrading the Cisco Network driver for VIC1227. 101

Installing xfsprogs. 102

Setting up JAVA. 102

NTP Configuration. 104

Enabling Syslog. 106

Setting ulimit 107

Disabling SELinux. 108

Set TCP Retries. 108

Disabling the Linux Firewall 109

Disable Swapping. 109

Disable Transparent Huge Pages. 109

Disable IPv6 Defaults. 110

Configuring Data Drives on Name Node and Other Management Nodes. 110

Configuring Data Drives on Data Nodes. 111

Configuring the Filesystem for NameNodes and Datanodes. 112

Cluster Verification. 114

Installing HDP 2.4. 118

Pre-Requisites for HDP. 119

Hortonworks Repo. 119

Downgrade Snappy on All Nodes. 122

HDP Installation. 122

Install and Setup Ambari Server on rhel1. 122

External Database PostgreSQL Installation. 123

Ambari 125

Setting up the Ambari Server on the Admin Node (rhel1) 129

Starting the Ambari Server 131

Confirm Ambari Server Startup. 131

Logging into the Ambari Server 132

Creating a Cluster 133

Select a Stack. 134

HDP Installation. 135

Hostname Pattern Expressions. 136

Confirm Hosts. 137

Choose Services. 137

Assign Masters. 138

Assign Slaves and Clients. 140

Customize Services. 141

HDFS. 142

HDFS JVM Settings. 142

Update Log Directory. 142

Summary of the Installation Process. 166

Installation of HDF. 167

Configuring Disk Drives for the Operating System on HDF Nodes. 169

Configuring Disk Drives for Data on HDF Nodes. 173

Installing OS on HDF Nodes. 174

Configuring the Filesystem for HDF Data Disks. 174

Post OS configuration of HDF Nodes. 176

Setting up HDF. 177

Starting HDF. 185

Bill of Materials. 187

About the Authors. 192

Acknowledgements. 192

Appendix A- Network Bonding. 193

Setting Up Network Bonding. 193

Appendix B – A Sample Application with HDF Using Live Twitter Stream with Apache Solr and Banana. 197

Create a Twitter Application. 197

Solr Installation. 201

Configuring Apache Nifi Project with Solr 202

Create a Data Flow with NiFi 206

Twitter Dashboard of the Demo Application. 209

Executive Summary

In the early days, most enterprises focused on batch processing to unlock the value of big data. Although enterprises benefited from batch processing, companies today want to extract value from their increasing volumes of data in real-time. Sensors, Internet of Things (IoT) devices, social networking and online transactions are all generating data that needs to be captured, monitored and rapidly processed to make data-based decisions instantly.

This strong demand for processing both data-in-motion and data-at-rest has been the main factor promoting the development of the Lambda architecture, which supports both batch and real-time processing of data. It includes support for complex event processing with applications such as Kafka and Storm, near real-time analytics with Spark Streaming, interactive SQL with Hive and HAWQ, as well as persistence and batch analytics with the Hadoop Distributed File System (HDFS) and MapReduce.

The Hortonworks Data Platform (HDP) is the industry’s enterprise-ready open-source Apache Hadoop framework that completely addresses the needs of data-at-rest processing, powers real-time customer applications and accelerates decision-making and innovation.

Hortonworks Dataflow (HDF) accelerates deployment of big data infrastructure and enables real-time analysis via an intuitive graphical user interface. HDF simplifies, streamlines and secures real-time data collection from distributed, heterogeneous data sources and provides a coding-free, off-the-shelf UI for on-time big data insights. HDF provides a better approach to data collection which is simple, secure, flexible and easily extensible.

Building next-generation big data architecture requires simplified and centralized management, high performance, and a linearly-scaling infrastructure and software platform. Cisco UCS® Integrated Infrastructure for Big Data and Analytics with HDP and HDF powers the next-generation architecture for big data systems, spanning a myriad of use cases including IoT, fraud analytics, and precision medicine via genome sequencing, among others. The configuration detailed in this document can be extended to clusters of various sizes depending on what the application demands. Up to 80 servers (5 racks) can be supported with no additional switching in a single UCS domain. Scaling beyond 5 racks (80 servers) can be implemented by interconnecting multiple UCS domains using Nexus 9000 Series switches or Application Centric Infrastructure (ACI), scalable to thousands of servers and to hundreds of petabytes of storage, all managed from a single pane using UCS Central.

Solution Overview

Introduction

Big Data is now all about data-in-motion, data-at-rest and analytic applications. This solution unlocks the value of big data while maximizing existing investments.

The speed of today’s processing systems has moved from classical data warehousing batch reporting to the realm of real-time processing and analytics. Real-time means near-zero latency, and access to information as it is generated to enable real-time decision-making from streaming data sources.

Big data, data science and analytics now have the potential to radically change how industries deliver services. They now empower consumers to make informed decisions with new business insights.

Apache Hadoop is the most popular big data framework. The technology is evolving rapidly to enable decisions for business strategy, technology architecture and implementation approaches for rapid success.

Solution

This solution is a simple and linearly scalable architecture that provides data processing on the HDP that caters to both batch and real-time processing with a centrally managed automated Hadoop deployment, providing all the benefits of the UCS Integrated Infrastructure for Big Data and Analytics.

With this solution you can deploy HDF on an existing Hadoop cluster or on a completely new cluster. This implementation addresses batch processing and stream processing combined with other technologies like NiFi, Kafka, etc.

Some of the features of this solution include:

· Infrastructure for both big data and agile analytics.

· Simplified infrastructure management via the Cisco UCS Manager.

· Flexible big data platform which works for both batch and real time processing.

· Architectural scalability - linear scaling based on data requirements.

· Usage of Hortonworks Data Platform (HDP) for comprehensive cluster monitoring and management.

This solution is based on the Cisco UCS Integrated Infrastructure for Big Data and Analytics which includes compute, storage, network and unified management capabilities to help companies manage the immense amount of data they collect today. It is built on the Cisco Unified Computing System (Cisco UCS) infrastructure using Cisco UCS 6200 Series Fabric Interconnects and Cisco UCS C-Series Rack Servers. This architecture is specifically designed for performance and linear scalability for big data workloads.

Audience

This document describes the architecture and deployment procedures for Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF) on a 64 Cisco UCS C240 M4 node cluster based on Cisco UCS Integrated Infrastructure for Big Data and Analytics. The intended audience of this document includes, but is not limited to, sales engineers, field consultants, professional services, IT managers, partner engineering and customers who want to deploy the Hortonworks Connected Data Platform on Cisco UCS Integrated Infrastructure for Big Data and Analytics.

Solution Summary

This CVD describes in detail the process for installing Hortonworks 2.4.2 with Apache Spark, Kafka, Storm and NiFi including the configuration details of the cluster. It also details application configuration for the HDF libraries. The current version of Cisco UCS Integrated Infrastructure for Big Data and Analytics offers the following configurations depending on the compute and storage requirements as shown in Table 1.

Table 1 Cisco UCS Integrated Infrastructure for Big Data and Analytics Configuration Details

Performance Optimized Option 1 (UCS-SL-CPA4-P1)	Performance Optimized Option 2 (UCS-SL-CPA4-P2)	Performance Optimized Option 3 UCS-SL-CPA4-P3	Capacity Optimized Option 1 UCS-SL-CPA4-C1	Capacity Optimized Option 2 UCS-SL-CPA4-C2
2 Cisco UCS 6296 UP, 96 ports Fabric Interconnects. 16 Cisco UCS C240 M4 Rack Servers (SFF), each with: 2 Intel Xeon processors E5-2680 v4 CPUs (14 cores on each CPU) 256 GB of memory Cisco 12-Gbps SAS Modular Raid Controller with 2-GB flash-based write cache (FBWC) 24 1.2-TB 10K SFF SAS drives (460 TB total) 2 240-GB 6-Gbps 2.5-inch Enterprise Value SATA SSDs for Boot Cisco UCS VIC 1227 (with 2 10 GE SFP+ ports)	2 Cisco UCS 6296 UP, 96 ports Fabric Interconnects. 16 Cisco UCS C240 M4 Rack Servers (SFF), each with: 2 Intel Xeon processors E5-2680 v4 CPUs (14 cores on each CPU) 256 GB of memory Cisco 12-Gbps SAS Modular Raid Controller with 2-GB flash-based write cache (FBWC) 24 1.8-TB 10K SFF SAS drives (691 TB total) 2 240-GB 6-Gbps 2.5-inch Enterprise Value SATA SSDs for Boot Cisco UCS VIC 1227 (with 2 10 GE SFP+ ports)	2 Cisco UCS 6332 Fabric Interconnects. 16 Cisco UCS C240 M4 Rack Servers (SFF), each with: 2 Intel Xeon processors E5-2680 v4 CPUs (14 cores on each CPU) 256 GB of memory Cisco 12-Gbps SAS Modular Raid Controller with 2-GB flash-based write cache (FBWC) 24 1.8-TB 10K SFF SAS drives (460 TB total) 2 240-GB 6-Gbps 2.5-inch Enterprise Value SATA SSDs for Boot Cisco UCS VIC 1387 (with 2 40 GE SFP+ ports)	2 Cisco UCS 6296 UP, 96 ports Fabric Interconnects. 16 Cisco UCS C240 M4 Rack Servers (LFF), each with: 2 Intel Xeon processors E5-2620 v4 CPUs (8 cores each CPU) 128 GB of memory Cisco 12-Gbps SAS Modular Raid Controller with 2-GB flash-based write cache (FBWC) 12 6-TB 7.2K LFF SAS drives (1152 TB total) 2 240-GB 6-Gbps 2.5-inch Enterprise Value SATA SSDs for Boot Cisco UCS VIC 1227 (with 2 10 GE SFP+ ports)	2 Cisco UCS 6296 UP, 96 ports Fabric Interconnects. 16 Cisco UCS C240 M4 Rack Servers (LFF), each with: 2 Intel Xeon processors E5-2620 v4 CPUs (8 cores each CPU) 256 GB of memory Cisco 12-Gbps SAS Modular Raid Controller with 2-GB flash-based write cache (FBWC) 12 8-TB 7.2K LFF SAS drives (1536 TB total) 2 240-GB 6-Gbps 2.5-inch Enterprise Value SATA SSDs for Boot Cisco UCS VIC 1227 (with 2 10 GE SFP+ ports)

Performance Optimized Option 1

(UCS-SL-CPA4-P1)

Performance Optimized Option 2

(UCS-SL-CPA4-P2)

Performance Optimized Option 3 UCS-SL-CPA4-P3

Capacity Optimized

Option 1

UCS-SL-CPA4-C1

Capacity Optimized

Option 2

UCS-SL-CPA4-C2

2 Cisco UCS 6296 UP, 96 ports Fabric Interconnects.

16 Cisco UCS C240 M4 Rack Servers (SFF), each with:

2 Intel Xeon processors E5-2680 v4 CPUs (14 cores on each CPU)

256 GB of memory

Cisco 12-Gbps SAS Modular Raid Controller with 2-GB flash-based write cache (FBWC)

24 1.2-TB 10K SFF SAS drives (460 TB total)

2 240-GB 6-Gbps 2.5-inch Enterprise Value SATA SSDs for Boot

Cisco UCS VIC 1227 (with 2 10 GE SFP+ ports)

2 Cisco UCS 6296 UP, 96 ports Fabric Interconnects.

16 Cisco UCS C240 M4 Rack Servers (SFF), each with:

2 Intel Xeon processors E5-2680 v4 CPUs (14 cores on each CPU)

256 GB of memory

Cisco 12-Gbps SAS Modular Raid Controller with 2-GB flash-based write cache (FBWC)

24 1.8-TB 10K SFF SAS drives (691 TB total)

2 240-GB 6-Gbps 2.5-inch Enterprise Value SATA SSDs for Boot

Cisco UCS VIC 1227 (with 2 10 GE SFP+ ports)

2 Cisco UCS 6332 Fabric Interconnects.

16 Cisco UCS C240 M4 Rack Servers (SFF), each with:

2 Intel Xeon processors E5-2680 v4 CPUs (14 cores on each CPU)

256 GB of memory

Cisco 12-Gbps SAS Modular Raid Controller with 2-GB flash-based write cache (FBWC)

24 1.8-TB 10K SFF SAS drives (460 TB total)

2 240-GB 6-Gbps 2.5-inch Enterprise Value SATA SSDs for Boot

Cisco UCS VIC 1387 (with 2 40 GE SFP+ ports)

2 Cisco UCS 6296 UP, 96 ports Fabric Interconnects.

16 Cisco UCS C240 M4 Rack Servers (LFF), each with:

2 Intel Xeon processors E5-2620 v4 CPUs (8
cores each CPU)

128 GB of memory

Cisco 12-Gbps SAS Modular Raid Controller with 2-GB flash-based write cache (FBWC)

12 6-TB 7.2K LFF SAS drives (1152 TB total)

2 240-GB 6-Gbps 2.5-inch Enterprise Value SATA SSDs for Boot

Cisco UCS VIC 1227 (with 2 10 GE SFP+ ports)

2 Cisco UCS 6296 UP, 96 ports Fabric Interconnects.

16 Cisco UCS C240 M4 Rack Servers (LFF), each with:

2 Intel Xeon processors E5-2620 v4 CPUs (8
cores each CPU)

256 GB of memory

Cisco 12-Gbps SAS Modular Raid Controller with 2-GB flash-based write cache (FBWC)

12 8-TB 7.2K LFF SAS drives (1536 TB total)

2 240-GB 6-Gbps 2.5-inch Enterprise Value SATA SSDs for Boot

Cisco UCS VIC 1227 (with 2 10 GE SFP+ ports)

Lambda Architecture: Merging Batch and Real-time Data

Companies have realized the power of big data and are now collecting more data than ever before. They need to get value from this data, often in real-time. Sensors, IoT devices, social network data and online transactions are just a few examples. They are all generating data continuously, 24x7. This data needs to be captured, monitored and processed quickly in order to make informed, data-driven decisions in real-time.

In addition, real-time streaming data needs to be sent to the company’s enterprise-wide data store where it can be used for traditional analysis and reporting, data discovery and as input to sophisticated machine-learning algorithms.

Real-time data processing at scale has very specialized requirements and one of the most popular tools in use today is Apache Spark. By moving the computation into memory Spark enables a wide variety of processing, including: traditional batch jobs, interactive analysis, real-time streaming and machine learning. Spark accomplishes this through a powerful set of built-in libraries: Spark Core, Spark Streaming, Spark SQL and MLlib. The internal details of these libraries are described below in the Technology Overview section of this document.

Spark enables applications in Hadoop clusters to run faster by caching datasets. With the data now being available in RAM instead of on disk, performance is improved dramatically, especially for iterative algorithms that access the same dataset repeatedly. In addition to traditional Map and Reduce operations, Spark’s unique features enable interactive and batch operations with SQL-like queries, machine learning and graph data processing. Spark allows programmers to develop complex, multi-step data pipelines using directed acyclic graph (DAG) patterns. It also supports in-memory data sharing across DAGs so that different jobs can work with the same data.

Analyzing streaming data at the edge using fog nodes for data collection, Apache Kafka for transport and Spark for analysis are becoming very common in the industry. Dashboards and visualization software on top of these analysis platforms enable enterprises to visualize and monitor their business in real-time.

The data can also be ingested into the enterprise distributed data lake, traditional SQL databases and NoSQL databases where it can be used to power dashboards, reporting, interactive analysis, data mining and machine learning.

The Hortonworks Connected Data Platform enables the creation of systems that implement the Lambda Architecture, a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods. This approach to architecture balances latency, throughput and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data. The two view outputs may be joined before presentation providing historical, interactive and real-time views of the data.

Figure 1 below depicts the merging of batch and real-time data. Data-in-motion is handled by HDF which collects the real-time data from the source, then filters, analyzes and delivers it to the target data store. HDF efficiently collects, aggregates and transports large amounts of streaming event data, processing it in real-time as necessary before sending it on. HDP is the industry’s enterprise-ready open-source Apache Hadoop framework that completely addresses the needs of data-at-rest processing, powers real-time customer applications and accelerates decision-making and innovation. Note the complete integration of dynamic, disparate and distributed data sources all with different formats, schemas, protocols, processing speeds and data sizes.

Figure 1 Typical Data Flow for Real Time Processing

The Cisco UCS Integrated Infrastructure for Big Data and Analytics, Hortonworks Data Platform and Hortonworks DataFlow are designed to accelerate the return-on-investment from big data. This solution delivers data from anywhere it originates to anywhere it needs to go.

With the ability to react quickly and make informed decisions based on real-time data, more businesses are moving beyond traditional batch-processing methods to faster, targeted, more informed approaches. Now processing can happen in seconds to minutes instead of hours to days.

The next section details the relevant reference architectures for meeting these real world challenges.

HDP Reference Architecture

Figure 2 shows the base configuration of 64 nodes with SFF (1.8TB) drives. This also offers HA of the cluster with 3 management nodes.

Figure 2 Reference Architecture for HDP

Note: This CVD describes the installation process of HDP 2.4.2 for a 64 node (3 Management Nodes for HA + 61 Data Nodes) on the Performance Optimized Option 2 Cluster configuration. It also has details on how to add in HDF if needed as part of the same cluster.

Note: If a customer decides to use the 6300 series FI (40 G connectivity) for the configuration instead of the 6200 series FI in Performance Optimized Option 2, the only change will be to add in the Cisco VIC 1387, the rest of the configuration will be the same.

HDF Reference Architecture

Figure 3 shows the complete reference architecture with HDF included for streaming data into the Hadoop cluster. In this architecture additional Cisco UCS C220 M4 nodes are added to the HDP architecture shown above.

Figure 3 Reference Architecture for Spark Streaming with HDF & HDP

Table 2 Configuration Details

Component	Description
Connectivity	2 Cisco UCS 6296UP 96-Port Fabric Interconnects Up to 80 servers with no additional switching infrastructure
HDP Nodes	64 Cisco UCS C240 M4 Rack Servers Hadoop NameNode/Secondary NameNode and Resource Manager and Data Nodes. Spark Executors are collocated on a Data Node. *Please refer to the Service Assignment section for specific service assignment and configuration details.
HDF Nodes	8 Cisco UCS C220 M4 Rack Servers

Technology Overview

Cisco UCS Integrated Infrastructure for Big Data and Analytics with HDP and HDF

The Cisco UCS Integrated Infrastructure for Big Data and Analytics solution for Hortonworks, is based on Cisco UCS Integrated Infrastructure for Big Data and Analytics, a highly scalable architecture designed to meet a variety of scale-out application demands with seamless data integration and management integration capabilities built using the following components:

Cisco UCS 6200 Series Fabric Interconnects

Cisco UCS 6200 Series Fabric Interconnects, as shown in Figure 4, provide high-bandwidth, low-latency connectivity for servers, with integrated, unified management provided for all connected devices by Cisco UCS Manager. Deployed in redundant pairs, Cisco Fabric Interconnects offer the full active-active redundancy, performance and exceptional scalability needed to support the large number of nodes that are typical in clusters serving big data applications. Cisco UCS Manager enables rapid and consistent server configuration using service profiles, automating ongoing system maintenance activities such as firmware updates across the entire cluster as a single operation. Cisco UCS Manager also offers advanced monitoring with options to raise alarms and send notifications about the health of the entire cluster.

Figure 4 Cisco UCS 6296UP 96-Port Fabric Interconnect

Description: isco UCS 6296UP 96-Port Fabric Interconnect

Cisco UCS 6300 Series Fabric Interconnects

Cisco UCS 6300 Series Fabric Interconnects is the new series of Fabric Interconnects that Cisco introduced. The Cisco UCS 6300 Series Fabric Interconnects as shown in Figure 5, are a core part of Cisco UCS, providing low-latency, lossless 10 and 40 Gigabit Ethernet, Fiber Channel over Ethernet (FCoE), and Fiber Channel functions with management capabilities for the system. All servers attached to Fabric Interconnects become part of a single, highly available management domain.

Figure 5 Cisco UCS 6332 UP 32 -Port Fabric Interconnect

Cisco UCS C-Series Rack Mount Servers

Cisco UCS C-Series Rack Mount C220 M4 High-Density Rack Servers (Small Form Factor Disk Drive Model), and Cisco UCS C240 M4 High-Density Rack Servers (Small Form Factor Disk Drive Model), are enterprise-class systems that support a wide range of computing, I/O and storage-capacity demands in compact designs, as shown in Figure 6 and Figure 7. Cisco UCS C-Series Rack-Mount Servers are based on the Intel Xeon E5-2600 v4 series processor family that delivers the best combination of performance, flexibility and efficiency gains with 12-Gbps SAS throughput. The Cisco UCS C240/C220 M4 servers provide 24 DIMM (PCIe) 3.0 slots and can support up to 1.5 TB of main memory (128 or 256 GB is typical for big data applications). It can support a range of disk drive and SSD options. Specifically, Cisco UCS C240 M4 supports twenty-four Small Form Factor (SFF) disk drives plus two (optional) internal SATA boot drives for a total of 26 internal drives in the Performance Optimized option or twelve Large Form Factor (LFF) disk drives option plus two (optional) internal SATA boot drives for a total of 14 internal drives are supported in the Capacity Optimized option. Cisco UCS Virtual Interface cards 1227 (VICs) are designed for the M4 generation of Cisco UCS C-Series Rack Servers (both C240 and C220 servers) and are optimized for high-bandwidth and low-latency cluster connectivity, with support for up to 256 virtual devices that are configured on demand through Cisco UCS Manager.

Figure 6 Cisco UCS C240 M4 Rack Server

Description: ttp://www.cisco.com/c/dam/en/us/products/collateral/servers-unified-computing/ucs-c240-m4-rack-server/datasheet-c78-732455.doc/_jcr_content/renditions/datasheet-c78-732455_0.jpg

Figure 7 Cisco UCS C220 M4 Rack Server (Small Form Factor Disk Drive Model)

Description: http://www.cisco.com/c/dam/en/us/td/docs/unified_computing/ucs/UCS_CVDs/Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_IBM.docx/_jcr_content/renditions/Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_IBM_5.jpg

Cisco UCS Virtual Interface Cards (VICs)

Cisco UCS Virtual Interface Cards (VICs) are unique to Cisco. Cisco UCS Virtual Interface Cards as shown in Figure 8, incorporate next-generation converged network adapter (CNA) technology from Cisco and offer dual 10-Gbps ports designed for use with Cisco UCS C-Series Rack-Mount Servers. Optimized for virtualized networking, these cards deliver high performance and bandwidth utilization, and support up to 256 virtual devices. The Cisco UCS Virtual Interface Card (VIC) 1227 is a dual-port, Enhanced Small Form-Factor Pluggable (SFP+), 10 Gigabit Ethernet, and Fiber Channel over Ethernet (FCoE)-capable, PCI Express (PCIe) modular LAN on motherboard (mLOM) adapter.

Figure 8 Cisco UCS VIC 1227

Description: O71144-large

The Cisco UCS Virtual Interface Card 1387 offers dual-port Enhanced Quad Small Form-Factor Pluggable (QSFP+) 40 Gigabit Ethernet and Fiber Channel over Ethernet (FCoE) in a modular-LAN-on-motherboard (mLOM) form factor. The mLOM slot can be used to install a Cisco VIC without consuming a PCIe slot providing greater I/O expandability. Shown in Figure 9, below.

Figure 9 Cisco UCS VIC 1387

Cisco UCS Manager

Cisco UCS Manager resides within the Cisco UCS 6200 Series Fabric Interconnect, (shown in Figure 10). It makes the system self-aware and self-integrating, managing all of the system components as a single logical entity. Cisco UCS Manager can be accessed through an intuitive graphical user interface (GUI), a command-line interface (CLI) or an XML application-programming interface (API). Cisco UCS Manager uses service profiles to define the personality, configuration, and connectivity of all resources within Cisco UCS, radically simplifying provisioning of resources so that the process takes minutes instead of days. This simplification allows IT departments to shift their focus from constant maintenance to strategic business initiatives.

Figure 10 Cisco UCS Manager

Hortonworks (HDP 2.4.2)

The Hortonworks Data Platform (HDP) delivers essential capabilities in a completely open, integrated and tested platform that is ready for enterprise usage. With Hadoop YARN at its core, HDP provides flexible enterprise data processing across a range of data processing engines, paired with comprehensive enterprise capabilities for governance, security and operations.

All the integration of the entire solution is thoroughly tested and fully documented. By taking the guesswork out of building out a Hadoop deployment, HDP gives a streamlined path to success in solving real business problems.

HDP for Data Access

With YARN at its foundation, HDP provides a range of processing engines that allow users to interact with data in multiple and parallel ways, without the need to stand up individual clusters for each data set/application. Some applications require batch while others require interactive SQL or low-latency access with NoSQL. Other applications require search, streaming or in-memory analytics. Apache Solr, Storm and Spark fulfill those needs respectively.

To function as a true data platform, the YARN-based architecture of HDP enables the widest possible range of access methods to coexist within the same cluster avoiding unnecessary and costly data silos.

As shown in Figure 11, HDP Enterprise natively provides for the following data access types:

· Batch – Apache MapReduce has served as the default Hadoop processing engine for years. It is tested and relied upon by many existing applications.

· Interactive SQL Query - Apache Hive™ is the de facto standard for SQL interactions at petabyte scale within Hadoop. Hive delivers interactive and batch SQL querying across the broadest set of SQL semantics.

· Search - HDP integrates Apache Solr to provide high-speed indexing and sub-second search times across all your HDFS data.

· Scripting - Apache Pig is a scripting language for Hadoop that can run on MapReduce or Apache Tez, allowing you to aggregate, join and sort data.

· Low-latency access via NoSQL - Apache HBase provides extremely fast access to data as a columnar format, NoSQL database. Apache Accumulo also provides high-performance storage and retrieval, but with fine-grained access control to the data.

· Streaming - Apache Storm processes streams of data in real time and can analyze and take action on data as it flows into HDFS.

Figure 11 YARN

HDP Cluster Operations

HDP delivers a comprehensive set of completely open operational capabilities that provide both visibilities into cluster health as well as the ability to manage, monitor and configure resources.

· Apache Ambari – is a completely open framework to provision, manage and monitor Apache Hadoop clusters. It provides a simple, elegant UI that allows you to image a Hadoop cluster.

· Apache Oozie - provides a critical scheduling capability to organize and schedule jobs within Enterprise Hadoop across all data access points.

HDF

Hortonworks DataFlow is a single combined platform for data acquisition, simple event processing, transport and delivery, designed to accommodate the highly diverse and complicated data flows generated by a world of connected people, systems and things. HDF enables simple, fast data acquisition, secure data transport, prioritized data flow and clear traceability of data from the very edge of your network all the way to the core data center. Through a combination of an intuitive visual interface, a high fidelity access and authorization mechanism and an “always on” chain of custody (data provenance) framework, HDF is the perfect complement to HDP to bring together historical and perishable insights for your business. A single integrated platform for data acquisition, simple event processing, transport and delivery mechanism from source to storage. Figure 12 shows the dataflow.

Figure 12 Hortonworks DataFlow Process

Benefits of Hortonworks DataFlow

HDF was designed specifically to meet the practical challenges of collecting data from a wide range of disparate data sources securely, efficiently and over a geographically disperse and possibly fragmented network.

Hortonworks Dataflow enables enterprises to

· Accelerate big data ROI through a single data-source agnostic collection platform.

· Reduce cost and complexity through an intuitive, real-time visual user interface.

· Unprecedented yet simple to implement data security from source to storage.

· Better business decisions with highly granular data sharing policies.

· React in real time by leveraging bi-directional data flows and prioritized data feeds.

· Adapt to new data sources through an extremely scalable, extensible platform.

Features of Hortonworks DataFlow

· Data collection – Integrated collection from dynamic, disparate and distributed sources of differing formats, schemas, protocols, speeds and sizes such as machines, geo location devices, click streams, files, social feeds, log files and videos.

· Real time decisions - Real-time evaluation of perishable insights at the edge as being pertinent or not, and executing upon consequent decisions to send, drop or locally store data as needed.

· Operational efficiency - Fast, effective drag and drop interface for creation, management, tuning and troubleshooting of dataflows, enabling coding free creation and adjustments of dataflows in five minutes or less.

· Security and provenance- Secure end-to-end routing from source to destination, with discrete user authorization and detailed real-time visual chain of custody and metadata (data provenance).

· Bi-directional dataflow - Reliably prioritize and transport data in real-time leveraging bi-directional dataflows to dynamically adapt to fluctuations in data volume, network connectivity and source and endpoint capacity.

· Command and control - Immediate ability to create, change, tune, view, start, stop, trace, parse, filter, join, merge, transform, fork, clone or replay dataflows through a visual user interface with real time operational visibility and feedback.

Figure 13 below displays a Hortonworks DataFlow Workflow Diagram.

Figure 13 Hortonworks DataFlow Workflow Diagram

Apache Spark

Traditional servers are not designed to support the massive scalability, performance and efficiency requirements of big data solutions. These outdated and siloed computing solutions are difficult to integrate with network and storage resources, and are time-consuming to deploy and expensive to operate. Cisco UCS Integrated Infrastructure for Big Data and Analytics with Apache Spark takes a different approach, combining computing, networking, storage access and management capabilities into a unified, fabric-based architecture that is optimized for big data workloads.

Apache Spark enhances the existing big data environments by adding new capabilities to Hadoop or other big data deployments. The platform unifies a broad range of capabilities—batch processing, real-time stream processing, advanced analytic capabilities, and interactive exploration—that can intelligently optimize applications. Spark’s key advantage is speed, with an advanced DAG execution engine that supports cyclic data flow and in-memory computing. It can run programs much faster than Hadoop/Map-Reduce. Applications can be developed using built-in, high-level Apache Spark operations or they can be interactive applications with Python, R, and Scala shells or they can be in Java. These various options allow users to quickly and easily build new applications and explore data faster.

Spark provides programmers with an application interface centered on a data structure called the resilient distributed dataset (RDD), a read-only set of data items distributed over a cluster of machines that is maintained in a fault-tolerant way. Calculations are performed and results are delivered only when needed, and results can be configured to persist in memory which allows Apache Spark to deliver a new level of computing efficiency and computation performance to big data deployments.

Figure 14 Apache Spark Libraries

$Description: C:\Users\matrived\Desktop\spark-screen-shot1.png$

As shown in Figure 14 above, Apache Spark has a number of libraries:

· Apache Spark SQL/DataFrame API for querying structured data inside Spark programs.

· Apache Spark streaming offers Spark’s core API enabling real-time processing of streaming data, including web server log files, social media, and messaging queues.

· MLLib to take advantage of machine- learning algorithms and accelerate application performance across clusters.

Spark can access diverse data sources including HDFS, Cassandra, HBase, and S3. Spark with YARN is an optimal way to schedule and run Spark jobs on a Hadoop cluster alongside a variety of other data processing frameworks, leveraging existing clusters using queue-based placement policies, and enabling security by running on Kerberos enabled clusters.

Some common use cases that are popular in the field with Apache Spark:

Insurance	Optimize claims reimbursement processes by using Spark’s machine learning capabilities to process and analyze all claims.
Healthcare	Build a Patient Care System using Spark Core, Streaming and SQL.
Retail	Use Spark to analyze point-of-sale data and coupon usage.
Internet	Use Spark’s ML capability to identify fake profiles and enhance product matches to show their customers.
Banking	Use a machine learning model to predict the profile of retail banking customers for certain financial products.
Government	Analyze spending across geography, time and category.
Scientific Research	Analyze earthquake events by time, depth and geography to predict future events.
Investment Banking	Analyze intra-day stock prices to predict future price movements.
Geospatial Analysis	Analyze Uber trips by time and geography to predict future demand and pricing.
Twitter Sentiment Analysis	Analyze large volumes of Tweets to determine positive, negative or neutral sentiment for specific organizations and products.
Airlines	Build a model for predicting airline travel delays.
Devices	Predict likelihood of a building exceeding threshold temperatures.

Spark Streaming

Spark Streaming brings Spark's language-integrated API to stream processing. The API is provided in Java, Scala and Python. Spark’s single execution engine and unified programming model for batch and streaming lead to some unique benefits over other traditional streaming systems.

In Spark Streaming, batches of Resilient Distributed Datasets (RDDs) are passed to Spark Streaming which processes these batches using the Spark Engine and returns a processed stream of batches. This processed stream can be written to the file system.

Each batch of data is a Resilient Distributed Dataset (RDD), which is the basic abstraction of a fault-tolerant dataset in Spark. This common representation allows batch and streaming workloads to interoperate seamlessly. Users can apply arbitrary Spark functions on each batch of streaming data: for example, it’s easy to join a DStream (key programming abstraction in Spark Streaming) with a precomputed static dataset (an RDD). Spark interoperability extends to rich libraries like MLlib (machine learning), SQL and DataFrames.

Machine learning models generated offline with MLlib can be applied on streaming data. Fault tolerance in Spark Streaming is similar to fault tolerance in Spark. Spark Streaming is a streaming platform that allows reaching sub-second latency. The processing capability scales linearly with the size of the cluster. As a result, it is being used in production by many organizations.

Spark SQL

Spark SQL allows users to query structured data inside Spark programs, using either SQL or DataFrame API. DataFrames and SQL provide a common way to access a variety of data sources, including Hive, Avro, Parquet, ORC, JSON and JDBC. Spark SQL brings native support for SQL to Spark and streamlines the process of querying data stored both in RDDs and in external sources. Spark SQL conveniently blurs the lines between RDDs and relational tables.

Spark SQL provides, a programming abstraction DataFrame that acts as a distributed SQL query engine. In addition to the data sources API, Spark SQL now makes it easier to compute over structured data stored in a wide variety of formats, including Parquet, JSON and the Apache Avro library. A built-in JDBC server makes it easy to connect to the structured data stored in relational database tables and perform big data analytics using the traditional BI tools.

Apache Kafka

Apache Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable and durable. Kafka maintains feeds of messages in topics. Producers write data to topics and consumers read from topics. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of coordinated consumers.

· Messages are simply byte arrays and developers can use them to store any object in any format with string, JSON, and Avro being the most common. It is possible to attach a key to each message, in which case the producer guarantees that all messages with the same key will arrive at the same partition.

· Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. When consuming from a topic, it is possible to configure a consumer group with multiple consumers. Each consumer in a consumer group will read messages from a unique subset of partitions in each topic they subscribe to, so each message is delivered to one consumer in the group and all messages with the same key arrive at the same consumer.

Common Use Cases include:

· Stream Processing

· Website Activity Tracking

· Metrics Collection and Monitoring

· Log Aggregation

Some of the important characteristics that make Kafka such an attractive option for these use cases include the following:

Feature	Description
Scalability	Distributed system scales easily with no downtime
Durability	Persists messages on disk, and provides intra-cluster replication
Reliability	Replicates data, supports multiple subscribers and automatically balances consumers in case of failure
Performance	High throughput for both publishing and subscribing, with disk structures that provide constant performance even with many terabytes of stored messages

Solution Design

Requirements

This CVD describes architecture and deployment procedures for Hortonworks Data Platform (HDP) 2.4.2 on a 64 Cisco UCS C240 M4SX node cluster based on Cisco UCS Integrated Infrastructure for Big Data and Analytics. The solution goes into detail configuring HDP 2.4.2 on the Cisco UCS Integrated infrastructure for Big Data. In addition it also details the configuration for Hortonworks Dataflow for various use cases.

The Performance cluster configuration consists of the following:

· Two Cisco UCS 6296UP Fabric Interconnects

· 64 Cisco UCS C240 M4 Rack-Mount servers (16 per rack)

· Four Cisco R42610 standard racks

· Eight Vertical Power distribution units (PDUs), (Country Specific)

Rack and PDU Configuration

Each rack consists of two vertical PDUs. The master rack consists of two Cisco UCS 6296UP Fabric Interconnects, sixteen Cisco UCS C240 M4 Servers connected to each of the vertical PDUs for redundancy; thereby, ensuring availability during power source failure. The expansion racks consists of sixteen Cisco UCS C240 M4 Servers connected to each of the vertical PDUs for redundancy; thereby, ensuring availability during power source failure, similar to the master rack. The Cisco UCS C220 M4 is included if installing the HDF with HDP.

Note: Please contact your Cisco representative for country specific information.

Table 3 describes the rack configurations of rack 1 (master rack) and racks 2-4 (expansion racks).

Table 3 Rack 1 (Master Rack) Racks 2-4 (Expansion Racks)

Cisco	Master Rack	Cisco	Expansion Rack
42URack		42URack
42	Cisco UCS FI 6296UP	42	Unused
41	Cisco UCS FI 6296UP	41	Unused
40	Cisco UCS FI 6296UP	40	Unused
39	Cisco UCS FI 6296UP	39	Unused
38	Cisco UCS C220 M4	38	Cisco UCS C220 M4
37	Cisco UCS C220 M4	37	Cisco UCS C220 M4
36	Unused	36	Unused
35	Unused	35	Unused
34	Unused	34	Unused
33	Unused	33	Unused
32	Cisco UCS C240 M4	32	Cisco UCS C240 M4
31	Cisco UCS C240 M4	31
30	Cisco UCS C240 M4	30	Cisco UCS C240 M4
29	Cisco UCS C240 M4	29
8	Cisco UCS C240 M4	28	Cisco UCS C240 M4
27	Cisco UCS C240 M4	27
26	Cisco UCS C240 M4	26	Cisco UCS C240 M4
25	Cisco UCS C240 M4	25
24	Cisco UCS C240 M4	24	Cisco UCS C240 M4
23	Cisco UCS C240 M4	23
22	Cisco UCS C240 M4	22	Cisco UCS C240 M4
21	Cisco UCS C240 M4	21
20	Cisco UCS C240 M4	20	Cisco UCS C240 M4
19	Cisco UCS C240 M4	19
18	Cisco UCS C240 M4	18	Cisco UCS C240 M4
17	Cisco UCS C240 M4	17	Cisco UCS C240 M4
16	Cisco UCS C240 M4	16
15	Cisco UCS C240 M4	15
14	Cisco UCS C240 M4	14	Cisco UCS C240 M4
13	Cisco UCS C240 M4	13
12	Cisco UCS C240 M4	12	Cisco UCS C240 M4
11	Cisco UCS C240 M4	11
10	Cisco UCS C240 M4	10	Cisco UCS C240 M4
9	Cisco UCS C240 M4	9
8	Cisco UCS C240 M4	8	Cisco UCS C240 M4
7	Cisco UCS C240 M4	7
6	Cisco UCS C240 M4	6	Cisco UCS C240 M4
5	Cisco UCS C240 M4	5
4	Cisco UCS C240 M4	4	Cisco UCS C240 M4
3	Cisco UCS C240 M4	3
2	Cisco UCS C240 M4	2	Cisco UCS C240 M4
1	Cisco UCS C240 M4	1

Port Configuration on Fabric Interconnects

Port Type	Port Number
Network	1
Server	2 to 65

Server Configuration and Cabling for Cisco UCS C-Series M4

The Cisco C-Series M4 rack server is equipped with Intel Xeon E5-2680 v4 processors; 256 GB of memory, Cisco UCS Virtual Interface Card 1227, Cisco 12-Gbps SAS Modular Raid Controller with 2-GB FBWC, Cisco UCS C240 M4 servers equipped with 24 1.8-TB 10K SFF SAS drives, 2 240-GB SATA SSD for Boot. The Cisco UCS C220 M4 servers have 8 960GB SFF SSDs

Figure 15 illustrates the port connectivity between the Fabric Interconnect, and Cisco UCS C240 M4 server. Sixteen Cisco UCS C240 M4 servers are used in Master rack configurations. Server configuration with Cisco UCS C220 M4 is similar as in Cisco UCS C240 M4.

Figure 15 Fabric Topology for Cisco C240 M4

For more information on physical connectivity and single-wire management see:

http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c-series_integration/ucsm3-1/b_C-Series-Integration_UCSM3-1/b_C-Series-Integration_UCSM3-1_chapter_010.html

For more information on physical connectivity illustrations and cluster setup, see:

http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c-series_integration/ucsm3-1/b_C-Series-Integration_UCSM3-1/b_C-Series-Integration_UCSM3-1_chapter_010.html

Figure 16 depicts a 64-node cluster. Each rack has 16 Cisco UCS C240 M4 servers. Each link in the figure represents 16 x 10 Gigabit Ethernet links from each of the 16 servers connecting to a Fabric Interconnect as a Direct Connect. Every server is connected to both Fabric Interconnects represented with a dual link.

Figure 16 64 Nodes Cluster Configuration

Software Distributions and Versions

The software distributions required versions are listed below.

Hortonworks Data Platform (HDP 2.4)

The Hortonworks Data Platform supported is HDP 2.4. For more information go to http://www.hortonworks.com.

Red Hat Enterprise Linux (RHEL)

The operating system supported is Red Hat Enterprise Linux 7.2. For more information visit http://www.redhat.com.

Software Versions

The software versions tested and validated in this document are shown in Table 4.

Table 4 Software Versions

Layer	Component	Version or Release
Compute	Cisco UCS C240-M4	C240M4.2.0.10c
Network	Cisco UCS 6296UP	UCS 3.1(1g) A
	Cisco UCS VIC1227 Firmware	4.1.1(d)
	Cisco UCS VIC1227 Driver	2.3.0.20
Storage	LSI SAS 3108	24.9.1-0011
	LSI MegaRAID SAS Driver	06.810.10.00
Software	Red Hat Enterprise Linux Server	7.2 (x86_64)
	Cisco UCS Manager	3.1(1g)
	HDP	2.4

Note: The latest drivers can be downloaded from the link below:
https://software.cisco.com/download/release.html?mdfid=283862063&flowid=25886&softwareid=283853158&release=1.5.7d&relind=AVAILABLE&rellifecycle=&reltype=latest

Note: The Latest Supported RAID controller Driver is already included with the RHEL 7.2 operating system.

Note: Cisco C240 M4 Rack Servers with Broadwell (E5 -2600 v4) CPUs are supported from Cisco UCS firmware 3.1(1g) onwards.

Fabric Configuration

This section provides details for configuring a fully redundant, highly available Cisco UCS 6296 fabric configuration.

· Initial setup of the Fabric Interconnect A and B.

· Connect to UCS Manager using virtual IP address of using the web browser.

· Launch UCS Manager.

· Enable server, uplink and appliance ports.

· Start discovery process.

· Create pools and polices for service profile template.

· Create Service Profile template and 64 Service profiles.

· Associate Service Profiles to servers.

Performing Initial Setup of Cisco UCS 6296 Fabric Interconnects

This section describes the initial setup of the Cisco UCS 6296 Fabric Interconnects A and B.

Configure Fabric Interconnect A

1. Connect to the console port on the first Cisco UCS 6296 Fabric Interconnect.

2. At the prompt to enter the configuration method, enter console to continue.

3. If asked to either perform a new setup or restore from backup, enter setup to continue.

4. Enter y to continue to set up a new Fabric Interconnect.

5. Enter y to enforce strong passwords.

6. Enter the password for the admin user.

7. Enter the same password again to confirm the password for the admin user.

8. When asked if this fabric interconnect is part of a cluster, answer y to continue.

9. Enter A for the switch fabric.

10. Enter the cluster name for the system name.

11. Enter the Mgmt0 IPv4 address.

12. Enter the Mgmt0 IPv4 netmask.

13. Enter the IPv4 address of the default gateway.

14. Enter the cluster IPv4 address.

15. To configure DNS, answer y.

16. Enter the DNS IPv4 address.

17. Answer y to set up the default domain name.

18. Enter the default domain name.

19. Review the settings that were printed to the console, and if they are correct, answer yes to save the configuration.

20. Wait for the login prompt to make sure the configuration has been saved.

Configure Fabric Interconnect B

1. Connect to the console port on the second Cisco UCS 6296 Fabric Interconnect.

2. When prompted to enter the configuration method, enter console to continue.

3. The installer detects the presence of the partner Fabric Interconnect and adds this fabric interconnect to the cluster. Enter y to continue the installation.

4. Enter the admin password that was configured for the first Fabric Interconnect.

5. Enter the Mgmt0 IPv4 address.

6. Answer yes to save the configuration.

7. Wait for the login prompt to confirm that the configuration has been saved.

For more information on configuring Cisco UCS 6200 Series Fabric Interconnect, see: http://www.cisco.com/en/US/docs/unified_computing/ucs/sw/gui/config/guide/2.0/b_UCSM_GUI_Configuration_Guide_2_0_chapter_0100.html.

Logging Into Cisco UCS Manager

To login to Cisco UCS Manager, complete the following steps:

1. Open a Web browser and navigate to the Cisco UCS 6296 Fabric Interconnect cluster address.

2. Click the Launch link to download the Cisco UCS Manager software.

3. If prompted to accept security certificates, accept as necessary.

4. When prompted, enter admin for the username and enter the administrative password.

5. Click Login to log in to the Cisco UCS Manager.

Upgrading UCSM Software to Version 3.1(1g)

This document assumes the use of UCS 3.1(1g) Refer to Cisco UCS 3.1 Release (upgrade the Cisco UCS Manager software and Cisco UCS 6296 Fabric Interconnect software to version 3.1(1g). Also, make sure the Cisco UCS C-Series version 3.1(1g) software bundles is installed on the Fabric Interconnects.

Adding a Block of IP Addresses for KVM Access

To create a block of KVM IP addresses for server access in the Cisco UCS environment, complete the following steps.

1. Select the LAN tab at the top of the left window.

2. Select Pools > IpPools > Ip Pool ext-mgmt.

3. Right-click IP Pool ext-mgmt.

4. Select Create Block of IPv4 Addresses.

Figure 17 below shows the initial screen for adding the block of IP addresses.

Figure 17 Adding a Block of IPv4 Addresses for KVM Access Part 1

5. Enter the starting IP address of the block and number of IPs needed, as well as the subnet and gateway information. Figure 18 below shows the window.

Figure 18 Adding a Block of IPv4 Addresses for KVM Access Part 2

6. Click OK to create the IP block.

7. Click OK in the message box.

Enabling Uplink Ports

To enable uplinks ports, complete the following steps:

1. Select the Equipment tab on the top left of the window.

2. Select Equipment > Fabric Interconnects > Fabric Interconnect A (primary) > Fixed Module.

3. Expand the Unconfigured Ethernet Ports section.

4. Select port 1 that is connected to the uplink switch, right-click, then select Reconfigure > Configure as Uplink Port.

5. Select Show Interface and select 10GB for Uplink Connection.

6. A pop-up window appears to confirm your selection. Click Yes then OK to continue.

7. Select Equipment > Fabric Interconnects > Fabric Interconnect B (subordinate) > Fixed Module.

8. Expand the Unconfigured Ethernet Ports section.

9. Select port number 1, which is connected to the uplink switch, right-click, then select Reconfigure > Configure as Uplink Port.

10. Select Show Interface and select 10GB for Uplink Connection.

11. A pop-up window appears to confirm your selection. Click Yes then OK to continue.

Figure 19 shows the access screen selections.

Figure 19 Enabling Uplink Ports

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-02 at 15.58.08.png$

Configuring VLANs

VLANs are configured as in shown in Table 5.

Table 5 VLAN Configurations

VLAN	NIC Port	Function
VLAN19	eth0	Data

The NIC will carry the data traffic from VLAN19. A single vNIC is used in this configuration and the Fabric Failover feature in Fabric Interconnects will take care of any physical port down issues. It will be a seamless transition from an application perspective.

To configure VLANs in the Cisco UCS Manager GUI, complete the following steps:

1. Select the LAN tab in the left pane in the UCSM GUI.

2. Select LAN > LAN Cloud > VLANs.

3. Right-click the VLANs under the root organization.

4. Select Create VLANs to create the VLAN.

Figure 20 shows the VLAN selection window.

Figure 20 Creating a VLAN

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-03 at 16.00.51.png$

5. Enter vlan19 for the VLAN Name.

6. Keep multicast policy as <not set>.

7. Select Common/Global for vlan19.

8. Enter 19 in the VLAN IDs field for the Create VLAN IDs.

9. Click OK and then, click Finish.

10. Click OK in the success message box.

Figure 21 below shows the Create VLANs window.

Figure 21 Creating VLAN for Data

11. Click OK and then, click Finish.

Enabling Server Ports

To enable server ports, complete the following steps:

1. Select the Equipment tab on the top left of the window.

2. Select Equipment > Fabric Interconnects > Fabric Interconnect A (primary) > Fixed Module.

3. Expand the Unconfigured Ethernet Ports section.

4. Select all the ports that are connected to the Servers right-click them, and select Reconfigure > Configure as a Server Port.

5. A pop-up window appears to confirm your selection. Click Yes then OK to continue.

6. Select Equipment > Fabric Interconnects > Fabric Interconnect B (subordinate) > Fixed Module.

7. Expand the Unconfigured Ethernet Ports section.

8. Select all the ports that are connected to the Servers right-click them, and select Reconfigure > Configure as a Server Port.

9. A pop-up window appears to confirm your selection. Click Yes, then OK to continue.

The screen for selecting and enabling server ports is shown in Figure 22 below.

Figure 22 Enabling Server Ports

After the Server Discovery, Port 1 will be a Network Port and 2-65 will be Server Ports.

Figure 23 below shows the list of Ethernet Ports created.

Figure 23 List of Ethernet Ports created.

Noy

Creating Pools for Service Profile Templates

Creating an Organization

Organizations are used to arrange and restrict access to various groups within the IT organization, thereby enabling multi-tenancy of the compute resources. This document does not assume the use of Organizations; however the necessary steps are provided for future reference.

To configure an organization within the Cisco UCS Manager GUI, complete the following steps:

1. Click New on the top left corner in the right pane in the UCS Manager GUI.

2. Select Create Organization from the options

3. Enter a name for the organization.

4. (Optional) Enter a description for the organization.

5. Click OK.

6. Click OK in the success message box.

Creating MAC Address Pools

To create MAC address pools, complete the following steps:

1. Select the LAN tab on the left of the window.

2. Select Pools > root.

3. Right-click MAC Pools under the root organization.

4. Select Create MAC Pool to create the MAC address pool. Enter ucs for the name of the MAC pool.

5. (Optional) Enter a description of the MAC pool.

6. Select Assignment Order Sequential.

7. Click Next.

8. Click Add.

9. Specify a starting MAC address.

10. Specify a size of the MAC address pool, which is sufficient to support the available server resources.

11. Click OK.

Figure 24 and Figure 25 show the windows for setting up the MAC Pool.

Figure 24 Creating a MAC Pool

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-03 at 15.04.15.png$

Figure 25 Specifying first MAC Address and Size

12. Click Finish.

Figure 26 below shows the screen to add the MAC address.

Figure 26 Adding MAC Addresses

Description: screenshot

13. When the message box displays, click OK.

Description: screenshot

Creating a Server Pool

A server pool contains a set of servers. These servers typically share the same characteristics. Those characteristics can be their location in the chassis, or an attribute such as server type, amount of memory, local storage, type of CPU, or local drive configuration. You can manually assign a server to a server pool, or use server pool policies and server pool policy qualifications to automate the assignment

To configure the server pool within the Cisco UCS Manager GUI, complete the following steps:

1. Select the Servers tab in the left pane in the UCS Manager GUI.

2. Select Pools > root.

3. Right-click the Server Pools.

4. Select Create Server Pool.

5. Enter your required name (ucs) for the Server Pool in the name text box.

6. (Optional) enter a description for the organization.

7. Click Next > to add the servers.

Figure 27 below show the Server Name and Description input window.

Figure 27 Server Name and Description

Description: screenshot

8. Select all the Cisco UCS C240M4SX servers to be added to the server pool that was previously created (ucs), then Click >> to add them to the pool.

9. Click Finish.

10. Click OK and then click Finish.

Figure 28 below displays the screen for adding servers.

Figure 28 Add Servers

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-03 at 16.19.48.png$

Creating Policies for Service Profile Templates

Creating Host Firmware Package Policy

Firmware management policies allow the administrator to select the corresponding packages for a given server configuration. These include adapters, BIOS, board controllers, FC adapters, HBA options, and storage controller properties as applicable.

To create a firmware management policy for a given server configuration using the Cisco UCS Manager GUI, complete the following steps:

1. Select the Servers tab in the left pane in the Cisco UCS Manager GUI.

2. Select Policies > root.

3. Right-click Host Firmware Packages.

4. Select Create Host Firmware Package.

5. Enter the required Host Firmware package name (ucs).

6. Select Simple radio button to configure the Host Firmware package.

7. Select the appropriate Rack package that has been installed.

8. Click OK to complete creating the management firmware package

9. Click OK.

Figure 29 shows the Create Host Firmware window.

Figure 29 Creating a Host Firmware Package

Creating QoS Policies

To create the QoS policy for a given server configuration using the Cisco UCS Manager GUI, complete the following steps:

Platinum Policy

1. Select the LAN tab in the left pane in the UCS Manager GUI.

2. Select Policies > root.

3. Right-click QoS Policies.

4. Select Create QoS Policy.

Figure 30 shows the selection process to create a QoS policy.

Figure 30 Create QoS Policies

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-03 at 16.28.09.png$

5. Enter Platinum as the name of the policy.

6. Select Platinum from the drop down menu.

7. Keep the Burst(Bytes) field set to default (10240).

8. Keep the Rate(Kbps) field set to default (line-rate).

9. Keep Host Control radio button set to default (none).

10. Once the pop-up window appears (Figure 31), click OK to complete the creation of the Policy.

Figure 31 QoS Policy Confirmation screen

Setting Jumbo Frames

To set Jumbo Frames and enable QoS, complete the following steps:

1. Select the LAN tab in the left pane in the Cisco UCSM GUI, as shown in Figure 32, below.

2. Select LAN Cloud > QoS System Class.

3. In the right pane, select the General tab.

4. In the Platinum row, enter 9216 for MTU.

5. Check the Enabled Check box next to Platinum.

6. In the Best Effort row, select none for weight.

7. In the Fiber Channel row, select none for weight.

8. Click Save Changes.

9. Click OK.

Figure 32 Cisco UCSM LAN Tab

Creating the Local Disk Configuration Policy

To create local disk configuration in the Cisco UCS Manager GUI, complete the following steps:

1. Select the Servers tab on the left pane in the UCS Manager GUI, as shown below in Figure 33.

2. Go to Policies > root.

3. Right-click Local Disk Config Policies.

4. Select Create Local Disk Configuration Policy.

5. Enter ucs as the local disk configuration policy name.

6. Change the Mode to Any Configuration. Check the Protect Configuration box.

7. Keep the FlexFlash State field as default (Disable).

8. Keep the FlexFlash RAID Reporting State field as default (Disable).

9. Click OK to complete the creation of the Local Disk Configuration Policy.

10. Click OK.

Figure 33 Cisco UCSM Servers Tab

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-03 at 16.40.35.png$

Creating Server BIOS Policy

The BIOS policy feature in Cisco UCS automates the BIOS configuration process. The traditional method of setting the BIOS is manually, and is often error-prone. By creating a BIOS policy and assigning the policy to a server or group of servers, can enable transparency within the BIOS settings configuration.

Note: BIOS settings can have a significant performance impact, depending on the workload and the applications. The BIOS settings listed in this section is for configurations optimized for best performance which can be adjusted based on the application, performance, and energy efficiency requirements.

To create a server BIOS policy using the Cisco UCS Manager GUI, complete the following steps:

1. Select the Servers tab in the left pane in the UCS Manager GUI, as shown in Figure 34 below.

2. Select Policies > root.

3. Right-click BIOS Policies.

4. Select Create BIOS Policy.

5. Enter your preferred BIOS policy name (ucs).

6. Change the BIOS settings as shown in the following figures.

7. Changes need to be made only to the Processor and RAS Memory settings.

Figure 34 Create Server BIOS Policy Processor screen

8. Set RAS Memory to Maximum Performance/enabled, and DRAM refresh rate to 1x, as shown in Error! Reference source not found. below.

Figure 35 RAS Memory screen

Creating the Boot Policy

To create boot policies within the Cisco UCS Manager GUI, complete the following steps:

1. Select the Servers tab in the left pane in the UCS Manager GUI.

2. Select Policies > root.

3. Right-click the Boot Policies.

4. Select Create Boot Policy.

Figure 36 shows the screen.

Figure 36 Create Server Boot Policy

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-03 at 17.13.08.png$

5. Enter ucs as the boot policy name.

6. (Optional) enter a description for the boot policy.

7. Keep the Reboot on Boot Order Change check box unchecked.

8. Keep Enforce vNIC/vHBA/iSCSI Name check box checked.

9. Keep Boot Mode Default (Legacy).

10. Expand Local Devices > Add CD/DVD and select Add Local CD/DVD.

11. Expand Local Devices and select Add Local Disk.

12. Expand vNICs and select Add LAN Boot and enter eth0.

13. Click OK to add the Boot Policy.

14. Click OK.

Figure 37 shows the Create Boot Policy screen to add the LAN Boot.

Figure 37 Create Boot Policy/Add LAN Boot screen

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-03 at 17.15.26.png$

Creating Power Control Policy

To create Power Control policies within the Cisco UCS Manager GUI, complete the following steps:

1. Select the Servers tab in the left pane in the UCS Manager GUI.

2. Select Policies > root.

3. Right-click the Power Control Policies.

4. Select Create Power Control Policy.

Figure 38 displays the Create Power Control Policy screen.

Figure 38 Create Power Control Policy

5. Enter ucs as the Power Control policy name.

6. (Optional) enter a description for the boot policy.

7. Select Performance for Fan Speed Policy.

8. Select No cap for Power Capping selection.

9. Click OK to create the Power Control Policy.

10. Click OK.

Figure 39 shows the Create Power Control Policy, Power Capping selection.

Figure 39 Create Power Control Policy

Creating a Service Profile Template

To create a Service Profile Template, complete the following steps:

1. Select the Servers tab in the left pane in the UCSM GUI.

2. Right-click Service Profile Templates.

3. Select Create Service Profile Template.

Figure 40 shows the Create Service Profile Template selection process.

Figure 40 Create Service Profile Template

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-03 at 17.18.26.png$

The Create Service Profile Template window appears.

To identify the service profile template, complete the following steps as in Figure 41:

1. Name the service profile template ucs. Select the Updating Template radio button.

2. In the UUID section, select Hardware Default as the UUID pool.

3. Click Next to continue to the next section.

Figure 41 Identifying the Service Profile Template

Configuring the Storage Provisioning for the Template

To configure storage policies, complete the following steps:

1. Go to the Local Disk Configuration Policy tab, and select ucs for the Local Storage as shown in Figure 42.

2. Click Next to continue to the next section.

Figure 42 Storage Provisioning

3. Click Next. The Networking window opens.

Configuring Network Settings for the Template

To configure network settings for the template, complete the following steps as shown in Figure 43.

1. Keep the Dynamic vNIC Connection Policy field at the default.

2. Select Expert radio button for the option how would you like to configure LAN connectivity?

3. Click Add to add a vNIC to the template.

Figure 43 Networking screen

The Create vNIC window opens, as shown in Figure 44.

4. Name the vNIC eth0.

5. Select ucs in the Mac Address Assignment pool.

6. Select the Fabric A radio button.

7. Check the Enable failover check box for the Fabric ID.

8. Check the VLAN19 check box for VLANs, and select the Native VLAN radio button.

9. Select MTU size as 9000.

Perform the next steps as shown in Figure 45 below.

10. Select Adapter Policy as Linux.

11. Select QoS Policy as Platinum.

12. Keep the Network Control Policy as Default.

13. Click OK.

Figure 44 Create vNIC window

Figure 45 Operational Parameters/Adapter Performance Profile

Figure 46 Networking LAN Configuration

Figure 46 above shows the Network vLAN 19 is set up.

Note: Optionally Network Bonding can be setup on the vNICs for each host for redundancy as well as for increased throughput; steps for this are captured in the Appendix 1.

14. Click Next to continue with SAN Connectivity as shown in Figure 47.

15. Select no vHBAs for How would you like to configure SAN Connectivity?

Figure 47 SAN Connectivity

16. Click Next to continue with Zoning as shown in Figure 48.

No changes need to be made to the Zoning screen.

Figure 48 Zoning

17. Click Next to continue with vNIC/vHBA placement and shown in Figure 49.

No changes are made to the vNIC/vHBA placement screen.

Figure 49 vNIC/vHBA Placement

18. Click Next to configure vMedia Policy.

Configuring the vMedia Policy for the Template

1. The vMedia Policy window appears as shown in Figure 50, click next to go to the next section.

Figure 50 vMedia Policy

Configuring Server Boot Order for the Template

To set the boot order for the servers, complete the following steps, as shown in Figure 51 below:

1. Select ucs in the Boot Policy name field.

2. Review to make sure that all of the boot devices were created and identified.

3. Verify that the boot devices are in the correct boot sequence.

4. Click OK.

5. Click Next to continue to the next section.

Figure 51 Server Boot Order Screen

6. In the Maintenance Policy window, (Figure 52), apply the maintenance policy.

7. Keep the Maintenance policy set to no policy used by default.

8. Click Next to continue to the next section.

Figure 52 Maintenance Policy Window

Configuring Server Assignment for the Template

In the Server Assignment window (Figure 53), to assign the servers to the pool, complete the following steps:

1. Select ucs for the Pool Assignment field.

2. Select the power state to be Up.

3. Keep the Server Pool Qualification field set to <not set>.

4. Check the Restrict Migration check box.

5. Select ucs in Host Firmware Package.

Figure 53 Server Assignment Window

Configuring Operational Policies for the Template

In the Operational Policies Window (Figure 54), complete the following steps:

1. Select ucs in the BIOS Policy field.

2. Select ucs in the Power Control Policy field.

Figure 54 Operational Policies Window

3. Click Finish to create the Service Profile template.

4. Click OK in the pop-up window to proceed.

5. Select the Servers tab in the left pane of the UCS Manager GUI as shown in Figure 55.

6. Go to Service Profile Templates > root.

7. Right-click Service Profile Templates ucs.

8. Select Create Service Profiles From Template.

Figure 55 Servers Tab/Create Service Profiles From Template

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-03 at 18.08.06.png$

The Create Service Profiles from Template window appears (Figure 56).

Figure 56 Create Service Profiles from Template Window

Association of the Service Profiles will take place automatically.

9. Click OK in the pop-up window to proceed.

The final Cisco UCS Manager window is shown in below in Figure 57.

Figure 57 Cisco UCS Manager Window

Installing Red Hat Enterprise Linux 7.2

The following section provides detailed procedures for installing Red Hat Enterprise Linux 7.2 using Software RAID (OS based Mirroring) on Cisco UCS C240 M4 servers. There are multiple ways to install the Red Hat Linux operating system. The installation procedure described in this deployment guide uses KVM console and virtual media from Cisco UCS Manager.

Note: This requires RHEL 7.2 DVD/ISO for the installation.

To install the Red Hat Linux 7.2 operating system, complete the following steps:

1. Log in to the Cisco UCS 6296 Fabric Interconnect and launch the Cisco UCS Manager application.

2. Select the Equipment tab as shown in Figure 58.

3. In the navigation pane expand Rack-Mounts and then Servers.

4. Right click on the server and select KVM Console.

5. In the KVM window, select the Virtual Media tab.

Figure 58 KVM Console

6. Click the Activate Virtual Devices found in the Virtual Media tab. (Figure 59 below.)

Figure 59 Virtual Media Tab

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-03 at 18.43.55.png$

7. In the KVM window (Figure 60), select the Virtual Media tab and click the Map CD/DVD.

Figure 60 KVM Console Window

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-03 at 18.45.13.png$

8. Browse to the Red Hat Enterprise Linux Server 7.2 installer ISO image file.

Note: The Red Hat Enterprise Linux 7.2 DVD is assumed to be on the client machine.

9. Click Open to add the image to the list of virtual media.

10. In the KVM window, select the KVM tab to monitor during boot.

11. In the KVM window, select the Macros > Static Macros > Ctrl-Alt-Del button in the upper left corner.

12. Click OK.

13. Click OK to reboot the system.

14. On reboot, the machine detects the presence of the Red Hat Enterprise Linux Server 7.2 install media.

15. Select the Install or Upgrade an Existing System.

16. Skip the Media test and start the installation.

17. Select language of installation (Figure 61), and click Continue.

Figure 61 Select Language Window

18. Select Date and time as shown in Figure 62.

Figure 62 Date and Time Window

19. Select the location on the map, set the time and click Done.

Figure 63 Installation Summary Window

20. Click on Installation Destination, shown above in Figure 63.

Figure 64 Installation Summary Window

A Caution symbols appears next to Installation Destination as shown in Figure 64 above.

21. This opens the Installation Destination window displaying the boot disks. This is shown in Figure 65 below.

22. Make the selection, and choose I will configure partitioning. Click Done.

Figure 65 Installation Destination Window

This opens the new window for creating the partitions, as shown in Figure 66.

23. Click on the + sign to add a new partition as shown below, boot partition of size 2048 MB.

Figure 66 Manual Partitioning

24. Click Add Mount Point to add the partition.

The screen refreshes to show the added Mount Point (Figure 67).

Figure 67 Manual Partitioning/Change Device Type

25. Change the Device type to RAID and make sure the RAID Level is RAID1 (Redundancy).

26. Click on Update Settings to save the changes.

27. Click on the + sign to create the swap partition of size 2048 MB as shown in Figure 68 below.

Figure 68 Manual Partitioning/Swap

28. Change the Device type to RAID and RAID level to RAID1 (Redundancy) and click on Update Settings.

Figure 69 Manual Partitioning/Swap

29. Click + to add the / partition. The size can be left empty so it uses the remaining capacity and click Add Mountpoint. (Figure 70).

Figure 70 Manual Partitioning/Add A New Mount Point

In the next window (Figure 71):

30. Change the Device type to RAID and RAID level to RAID1 (Redundancy). Click Update Settings.

Figure 71 / Partition/Change Device Type to RAID

31. Click Done to go back to the main screen and continue the Installation.

The Installation screen opens (Figure 72).

32. Click on Software Selection.

Figure 72 Installation Summary Window

The Software Selection screen opens (Figure 73).

33. Select Infrastructure Server and select the Add-Ons as noted below. Click Done.

Figure 73 Software Selection

The Installation Summary window returns (Figure 72).

34. Click on Network and Hostname.

Figure 74 Network and Host Name Window

Configure Hostname and Networking for the Host ().

35. Type in the hostname as shown below.

Figure 75 Network and Host Name

36. Click on Configure to open the Network Connectivity window (Figure 76).

37. Click on IPV4Settings.

Figure 76 Network Connectivity Window

38. Change the Method to Manual and click Add.

Figure 77 shows the Add Details pop up window.

39. Enter the IP Address, Netmask and Gateway details. Click Add after each addition.

Figure 77 Add IP Address, Netmask and Gateway Details

40. Click Save.

The Ethernet window will open (Figure 78).

41. Update the hostname and turn Ethernet ON. Click Done to return to the main menu.

Figure 78 Ethernet Window

The Installation Summary window opens (Figure 79).

42. Click Begin Installation in the main menu.

Figure 79 Installation Summary Window

A new window opens (Figure 80).

43. Select Root Password in the User Settings.

Figure 80 Select Root Password

44. On the next screen (Figure 81), enter the Root Password and click done.

Figure 81 Enter the Root Password

A progress window will open (Figure 82).

Figure 82 Progress Bar

45. Once the installation is complete reboot the system.

46. Repeat steps 1 to 45 to install Red Hat Enterprise Linux 7.2 on Servers 2 through 64.

Note: The OS installation and configuration of the nodes that is mentioned above can be automated through PXE boot or third party tools.

The hostnames and their corresponding IP addresses are shown in Table 6.

Table 6 Hostnames and IP Addresses

Hostname	eth0
rhel1	10.4.1.31
rhel2	10.4.1.32
rhel3	10.4.1.33
rhel4	10.4.1.34
rhel1	10.4.1.35
rhel6	10.4.1.36
rhel7	10.4.1.37
rhel8	10.4.1.38
rhel9	10.4.1.39
rhel10	10.4.1.40
rhel11	10.4.1.41
rhel12	10.4.1.42
rhel13	10.4.1.43
rhel14	10.4.1.44
rhel15	10.4.1.45
rhel16	10.4.1.46
…	…
rhel64	10.4.1.94

Note: Hortonworks Data Platform (HDP) does not recommend multi-homing configurations, so please assign only one network to each node.

Post OS Install Configuration

Choose one of the nodes of the cluster or a separate node as the Admin Node for management such as HDP installation, cluster parallel shell, creating a local Red Hat repo and others. In this document, we use rhel1 for this purpose.

Setting Up Password-less Login

To manage all of the clusters nodes from the admin node password-less login needs to be setup. It assists in automating common tasks with clustershell (clush, a cluster wide parallel shell), and shell-scripts without having to use passwords.

Once Red Hat Linux is installed across all the nodes in the cluster, follow the steps below in order to enable password-less login across all the nodes.

1. Login to the Admin Node (rhel1).

#ssh 10.4.1.31

2. Run the ssh-keygen command to create both public and private keys on the admin node.

3. Then run the following command from the admin node to copy the public key id_rsa.pub to all the nodes of the cluster. ssh-copy-id appends the keys to the remote-host’s .ssh/authorized_keys.

#for IP in {31..94}; do echo -n "$IP -> "; ssh-copy-id -i ~/.ssh/id_rsa.pub 10.4.1.$IP; done

4. Enter yes for Are you sure you want to continue connecting (yes/no)?

5. Enter the password of the remote host.

Configuring /etc/hosts

Setup /etc/hosts on the Admin node; this is a pre-configuration to setup DNS as shown in the next section.

To create the host file on the admin node, complete the following steps:

1. Populate the host file with IP addresses and corresponding hostnames on the Admin node (rhel1) and other nodes as follows:

2. On Admin Node (rhel1)

#vi /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 \ localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 \ localhost6.localdomain6

10.4.1.31 rhel1

10.4.1.32 rhel2

10.4.1.33 rhel3

10.4.1.34 rhel4

10.4.1.35 rhel5

10.4.1.36 rhel6

10.4.1.37 rhel7

10.4.1.38 rhel8

10.4.1.39 rhel9

10.4.1.40 rhel10

10.4.1.41 rhel11

10.4.1.42 rhel12

10.4.1.43 rhel13

10.4.1.44 rhel14

10.4.1.45 rhel15

10.4.1.46 rhel16

...

10.4.1.94 rhel64

Creating a Red Hat Enterprise Linux (RHEL) 7.2 Local Repo

To create a repository using RHEL DVD or ISO on the admin node (in this deployment rhel1 is used for this purpose), create a directory with all the required RPMs, run the createrepo command and then publish the resulting repository.

1. Log on to rhel1. Create a directory that would contain the repository.

2. #mkdir -p /var/www/html/rhelrepo

3. Copy the contents of the Red Hat DVD to /var/www/html/rhelrepo

4. Alternatively, if you have access to a Red Hat ISO Image, Copy the ISO file to rhel1.

5. And login back to rhel1 and create the mount directory.

#scp rhel-server-7.2-x86_64-dvd.iso rhel1:/root/

#mkdir -p /mnt/rheliso

#mount -t iso9660 -o loop /root/rhel-server-7.2-x86_64-dvd.iso /mnt/rheliso/

6. Copy the contents of the ISO to the /var/www/html/rhelrepo directory.

#cp -r /mnt/rheliso/* /var/www/html/rhelrepo

7. Now on rhel1 create a .repo file to enable the use of the yum command.

#vi /var/www/html/rhelrepo/rheliso.repo

[rhel7.2]

name=Red Hat Enterprise Linux 7.2

baseurl=http://10.4.1.31/rhelrepo

gpgcheck=0

enabled=1

8. Now copy rheliso.repo file from /var/www/html/rhelrepo to /etc/yum.repos.d on rhel1.

#cp /var/www/html/rhelrepo/rheliso.repo /etc/yum.repos.d/

Note: Based on this repo file yum requires httpd to be running on rhel1 for other nodes to access the repository.

9. To make use of repository files on rhel1 without httpd, edit the baseurl of repo file /etc/yum.repos.d/rheliso.repo to point repository location in the file system.

Note: This step is needed to install software on Admin Node (rhel1) using the repo (such as httpd, create-repo, etc.)

#vi /etc/yum.repos.d/rheliso.repo

[rhel7.2]

name=Red Hat Enterprise Linux 7.2

baseurl=file:///var/www/html/rhelrepo

gpgcheck=0

enabled=1

Creating the Red Hat Repository Database.

1. Install the createrepo package on admin node (rhel1). Use it to regenerate the repository database(s) for the local copy of the RHEL DVD contents.

#yum -y install createrepo

2. Run createrepo on the RHEL repository to create the repo database on admin node

#cd /var/www/html/rhelrepo

#createrepo .

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-05 at 12.00.19.png$

Setting up ClusterShell

ClusterShell (or clush) is the cluster-wide shell that runs commands on several hosts in parallel.

1. From the system connected to the Internet download Cluster shell (clush) and install it on rhel1. Cluster shell is available from EPEL (Extra Packages for Enterprise Linux) repository.

#wget http://rpm.pbone.net/index.php3/stat/4/idpl/31529309/dir/redhat_el_7/com/clustershell-1.7-1.el7.noarch.rpm.html

#scp clustershell-1.7-1.el7.noarch.rpm rhel1:/root/

2. Login to rhel1 and install cluster shell.

#yum –y install clustershell-1.7-1.el7.noarch.rpm

3. Edit /etc/clustershell/groups.d/local.cfg file to include hostnames for all the nodes of the cluster. This set of hosts is taken when running clush with the ‘-a’ option.

4. For 64 node cluster as in our CVD, set groups file as follows,

#vi /etc/clustershell/groups.d/local.cfg

all: rhel[1-64]

Note: For more information and documentation on ClusterShell, visit https://github.com/cea-hpc/clustershell/wiki/UserAndProgrammingGuide.

Note: clustershell will not work if not ssh to the machine earlier (as it requires to be in known_hosts file), for instance, as in the case below for rhel<host>.

Installing httpd

Setting up RHEL repo on the admin node requires httpd. To set up RHEL repository on the admin node, complete the following steps:

1. Install httpd on the admin node to host repositories.

The Red Hat repository is hosted using HTTP on the admin node, this machine is accessible by all the hosts in the cluster.

#yum –y install httpd

2. Add ServerName and make the necessary changes to the server configuration file.

#vi /etc/httpd/conf/httpd.conf

ServerName 10.4.1.31:80

3. Start httpd

#service httpd start

#chkconfig httpd on

Set Up all Nodes to use the RHEL Repository

Note: Based on this repo file yum requires httpd to be running on rhel1 for other nodes to access the repository.

1. Copy the rheliso.repo to all the nodes of the cluster.

#clush –w rhel[2-64] -c /var/www/html/rhelrepo/rheliso.repo --dest=/etc/yum.repos.d/

2. Also copy the /etc/hosts file to all nodes.

#clush –w rhel[2-64] –c /etc/hosts –-dest=/etc/hosts

3. Purge the yum caches after this

#clush -a -B yum clean all

#clush –a –B yum repolist

Note: While suggested configuration is to disable SELinux as shown below, if for any reason SELinux needs to be enabled on the cluster, then ensure to run the following to make sure that the httpd is able to read the Yum repofiles.

#chcon -R -t httpd_sys_content_t /var/www/html/

Configuring DNS

This section details setting up DNS using dnsmasq as an example based on the /etc/hosts configuration setup in the earlier section.

To create the host file across all the nodes in the cluster, complete the following steps:

1. Disable Network manager on all nodes:

#clush -a -b service NetworkManager stop

#clush -a -b chkconfig NetworkManager off

2. Update /etc/resolv.conf file to point to Admin Node:

#vi /etc/resolv.conf

nameserver 10.4.1.31

Note: This step is needed if setting up dnsmasq on Admin node. Otherwise this file should be updated with the correct nameserver.

Note: Alternatively #systemctl start NetworkManager.service can be used to start the service. #systemctl stop NetworkManager.service can be used to stop the service. Use #systemctl disable NetworkManager.service to stop a service from being automatically started at boot time.

3. Install and Start dnsmasq on Admin node:

#service dnsmasq start

#chkconfig dnsmasq on

4. Deploy /etc/resolv.conf from the admin node (rhel1) to all the nodes via the following clush command:

#clush -a -B -c /etc/resolv.conf

Note: A clush copy without –dest copies to the same directory location as the source-file directory

5. Ensure DNS is working fine by running the following command on Admin node and any data-node:

[root@rhel2 ~]# nslookup rhel1

Server: 10.4.1.31

Address: 10.4.1.31#53

Name: rhel1

Address: 10.4.1.31 ç

Note: yum install –y bind-utils will need to be run for nslookup to utility to run.

Upgrading the Cisco Network driver for VIC1227

The latest Cisco Network driver is required for performance and updates. The latest drivers can be downloaded from the link below:

https://software.cisco.com/download/release.html?mdfid=286281356&reltype=latest&relind=AVAILABLE&dwnld=true&softwareid=283853158&rellifecycle=&atcFlag=N&release=2.0%289b%29&dwldImageGuid=84C2FF3BB579A1BF32F7227C59F6DF886CEDBE99&flowid=71443

1. In the ISO image, the required driver kmod-enic-2.3.0.20-rhel7u2.el7.x86_64.rpm can be located at \Linux\Network\Cisco\VIC\RHEL\RHEL7.2.

2. From a node connected to the Internet, download, extract and transfer kmod-enic-2.3.0.20-rhel7u2.el7.x86_64.rpm to rhel1 (admin node).

3. Install the rpm on all nodes of the cluster using the following clush commands. For this example the rpm is assumed to be in present working directory of rhel1.

[root@rhel1 ~]# clush -a -b -c kmod-enic-2.3.0.20-rhel7u2.el7.x86_64.rpm

[root@rhel1 ~]# clush -a -b "rpm –ivh kmod-enic-2.3.0.20-rhel7u2.el7.x86_64.rpm"

4. Ensure that the above installed version of kmod-enic driver is being used on all nodes by running the command "modinfo enic" on all nodes:

[root@rhel1 ~]# clush -a -B "modinfo enic | head -5"

5. Also it is recommended to download the kmod-megaraid driver for higher performance , the RPM can be found in the same package at \Linux\Storage\LSI\Cisco_Storage_12G_SAS_RAID_controller\RHEL\RHEL7.2

Installing xfsprogs

From the admin node rhel1 run the command below to Install xfsprogs on all the nodes for xfs filesystem.

#clush -a -B yum -y install xfsprogs

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-05 at 12.23.12.png$

Setting up JAVA

HDP 2.4 requires JAVA 8.

1. Download jdk-7u75-linux-x64.rpm from oracle.com and scp the rpm to (http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html ) to admin node (rhel1).

2. Run the following commands on admin node (rhel1) to install and setup java on all nodes.

3. Copy JDK rpm to all nodes.

clush -a -b -c /root/jdk-8u91-linux-x64.rpm --dest=/root/

4. Extract and Install JDK on all nodes.

clush -a -b rpm -ivh /root/jdk-8u91-linux-x64.rpm

5. Create the following files java-set-alternatives.sh and java-home.sh on admin node (rhel1).

vi java-set-alternatives.sh

#!/bin/bash

for item in java javac javaws jar jps javah javap jcontrol jconsole jdb; do

rm -f /var/lib/alternatives/$item

alternatives --install /usr/bin/$item $item /usr/java/jdk1.8.0_91/bin/$item 9

alternatives --set $item /usr/java/jdk1.8.0_91/bin/$item

done

vi java-home.sh

export JAVA_HOME=/usr/java/jdk1.8.0_91

6. Make the two java scripts created above executable.

chmod 755 ./java-set-alternatives.sh ./java-home.sh

7. Copying java-set-alternatives.sh to all nodes.

clush -b -a -c ./java-set-alternatives.sh --dest=/root/

8. Setup Java Alternatives.

clush -b -a ./java-set-alternatives.sh

9. Ensure correct java is setup on all nodes (should point to newly installed java path).

clush -b -a "alternatives --display java | head -2"

10. Setup JAVA_HOME on all nodes.

clush -b -a -c ./java-home.sh --dest=/etc/profile.d

11. Display JAVA_HOME on all nodes.

clush -a -b "echo \$JAVA_HOME"

12. Display current java –version.

clush -B -a java -version

NTP Configuration

The Network Time Protocol (NTP) is used to synchronize the time of all the nodes within the cluster. The Network Time Protocol daemon (ntpd) sets and maintains the system time of day in synchronism with the timeserver located in the admin node (rhel1). Configuring NTP is critical for any Hadoop Cluster. If server clocks in the cluster drift out of sync, serious problems will occur with HBase and other services.

#clush –a –b "yum –y install ntp"

Note: Installing an internal NTP server keeps your cluster synchronized even when an outside NTP server is inaccessible.

1. Configure /etc/ntp.conf on the admin node only with the following contents:

#vi /etc/ntp.conf

driftfile /var/lib/ntp/drift

restrict 127.0.0.1

restrict -6 ::1

server 127.127.1.0

fudge 127.127.1.0 stratum 10

includefile /etc/ntp/crypto/pw

keys /etc/ntp/keys

2. Create /root/ntp.conf on the admin node and copy it to all nodes:

#vi /root/ntp.conf

server 10.4.1.31

driftfile /var/lib/ntp/drift

restrict 127.0.0.1

restrict -6 ::1

includefile /etc/ntp/crypto/pw

keys /etc/ntp/keys

3. Copy ntp.conf file from the admin node to /etc of all the nodes by executing the following command in the admin node (rhel1):

#for SERVER in {32..94}; do scp /root/ntp.conf 10.4.1.$SERVER:/etc/ntp.conf; done

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-05 at 12.52.42.png$

Note: Instead of the above for loop, this could be run as a clush command with "–w"option.

#clush -w rhel[2-94] –b –c /root/ntp.conf --dest=/etc

4. Run the following to synchronize the time and restart NTP daemon on all nodes.

#clush -a -b "service ntpd stop"

#clush -a -b "ntpdate rhel1"

#clush -a -b "service ntpd start"

5. Ensure restart of NTP daemon across reboots:

#clush –a –b "systemctl enable ntpd"

Alternatively, the new Chrony service can be installed, that is quicker to synchronize clocks in mobile and virtual systems.

1. Install the Chrony service:

# yum install -y chrony

2. Activate the Chrony service at boot:

# systemctl enable chronyd

3. Start the Chrony service:

# systemctl start chronyd

The Chrony configuration is in the /etc/chrony.conf file, configured similar to /etc/ntp.conf.

Enabling Syslog

Syslog must be enabled on each node to preserve logs regarding killed processes or failed jobs. Modern versions such as syslog-ng and rsyslog are possible, making it more difficult to be sure that a syslog daemon is present. One of the following commands should suffice to confirm that the service is properly configured:

#clush -B -a rsyslogd –v

#clush -B -a service rsyslog status

Setting ulimit

On each node, ulimit -n specifies the number of inodes that can be opened simultaneously. With the default value of 1024, the system appears to be out of disk space and shows no inodes available. This value should be set to 64000 on every node.

Higher values are unlikely to result in an appreciable performance gain.

1. For setting the ulimit on Redhat, edit /etc/security/limits.conf on admin node rhel1 and add the following lines:

root soft nofile 64000

root hard nofile 64000

2. Copy the /etc/security/limits.conf file from admin node (rhel1) to all the nodes using the following command.

#clush -a -b -c /etc/security/limits.conf --dest=/etc/security/

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-05 at 13.05.44.png$

3. Check that the /etc/pam.d/su file contains the following settings:

#%PAM-1.0

auth sufficient pam_rootOK.so

# Uncomment the following line to implicitly trust users in the "wheel" group.

#auth sufficient pam_wheel.so trust use_uid

# Uncomment the following line to require a user to be in the "wheel" group.

#auth required pam_wheel.so use_uid

auth include system-auth

account sufficient pam_succeed_if.so uid = 0 use_uid quiet

account include system-auth

password include system-auth

session include system-auth

session optional pam_xauth.so

Note: The ulimit values are applied on a new shell, running the command on a node on an earlier instance of a shell will show old values.

Disabling SELinux

SELinux must be disabled during the install procedure and cluster setup. SELinux can be enabled after installation and while the cluster is running.

1. SELinux can be disabled by editing /etc/selinux/config and changing the SELINUX line to SELINUX=disabled. The following command will disable SELINUX on all nodes.

#clush -a -b "sed –i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config"

$Description: C:\Users\matrived\AppData\Local\Microsoft\Windows\Temporary Internet Files\Content.Word\Snap 2014-10-27 at 17.45.41.png$

#clush –a –b "setenforce 0"

Note: The above command may fail if SELinux is already disabled.

2. Reboot the machine, if needed for SELinux to be disabled incase it does not take effect. It can checked using:

#clush –a –b sestatus

Set TCP Retries

Adjusting the tcp_retries parameter for the system network enables faster detection of failed nodes. Given the advanced networking features of UCS this is a safe and recommended change (failures observed at the operating system layer are most likely serious rather than transitory). On each node, set the number of TCP retries to 5 can help detect unreachable nodes with less latency.

1. Edit the file /etc/sysctl.conf and on admin node rhel1 and add the following lines:

net.ipv4.tcp_retries2=5

2. Copy the /etc/sysctl.conf file from admin node (rhel1) to all the nodes using the following command:

#clush -a -b -c /etc/sysctl.conf --dest=/etc/

3. Load the settings from default sysctl file /etc/sysctl.conf by running.

#clush -B -a sysctl -p

Disabling the Linux Firewall

The default Linux firewall settings are far too restrictive for any Hadoop deployment. Since the UCS Big Data deployment will be in its own isolated network there is no need for that additional firewall.

#clush -a -b " firewall-cmd --zone=public --add-port=80/tcp --permanent"

#clush -a -b "firewall-cmd --reload"

#clush –a –b “systemctl disable firewalld”

Disable Swapping

1. In order to reduce Swapping, run the following on all nodes. Variable vm.swappiness defines how often swap should be used, 60 is default.

#clush -a -b " echo 'vm.swappiness=1' >> /etc/sysctl.conf"

2. Load the settings from default sysctl file /etc/sysctl.conf.

#clush –a –b "sysctl –p"

Disable Transparent Huge Pages

Disabling Transparent Huge Pages (THP) reduces elevated CPU usage caused by THP.

#clush -a -b "echo never > /sys/kernel/mm/transparent_hugepage/enabled”

#clush -a -b "echo never > /sys/kernel/mm/transparent_hugepage/defrag"

1. The above commands must be run for every reboot, so copy this command to /etc/rc.local so they are executed automatically for every reboot.

2. On the Admin node, run the following commands

#rm –f /root/thp_disable

#echo "echo never > /sys/kernel/mm/transparent_hugepage/enabled" >>

/root/thp_disable

#echo "echo never > /sys/kernel/mm/transparent_hugepage/defrag " >>

/root/thp_disable

3. Copy file to each node:

#clush –a –b –c /root/thp_disable

4. Append the content of file thp_disable to /etc/rc.local:

#clush -a -b “cat /root/thp_disable >> /etc/rc.local”

Disable IPv6 Defaults

1. Disable IPv6 as the addresses used are IPv4.

#clush -a -b "echo 'net.ipv6.conf.all.disable_ipv6 = 1' >> /etc/sysctl.conf"

#clush -a -b "echo 'net.ipv6.conf.default.disable_ipv6 = 1' >> /etc/sysctl.conf"

#clush -a -b "echo 'net.ipv6.conf.lo.disable_ipv6 = 1' >> /etc/sysctl.conf"

2. Load the settings from default sysctl file /etc/sysctl.conf.

#clush –a –b "sysctl –p"

Configuring Data Drives on Name Node and Other Management Nodes

This section describes steps to configure non-OS disk drives as RAID1 using StorCli command as described below. All the drives are going to be part of a single RAID1 volume. This volume can be used for staging any client data to be loaded to HDFS. This volume won’t be used for HDFS data.

1. From the website download storcli http://www.lsi.com/downloads/Public/RAID%20Controllers/RAID%20Controllers%20Common%20Files/1.14.12_StorCLI.zip

2. Extract the zip file and copy storcli-1.14.12-1.noarch.rpm from the linux directory.

3. Download storcli and its dependencies and transfer to Admin node.

#scp storcli-1.14.12-1.noarch.rpm rhel1:/root/

4. Copy storcli rpm to all the nodes using the following commands:

#clush -a -b -c /root/storcli-1.14.12-1.noarch.rpm --dest=/root/

5. Run the below command to install storcli on all the nodes:

#clush -a -b “rpm -ivh storcli-1.14.12-1.noarch.rpm”

6. Run the below command to copy storcli64 to root directory.

#cd /opt/MegaRAID/storcli/

#cp storcli64 /root/

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-05 at 15.35.09.png$

7. Copy storcli64 to all the nodes using the following commands:

#clush -a -b -c /root/storcli64 --dest=/root/

8. Run the following script as root user on rhel1 to rhel3 to create the virtual drives for the management nodes.

#vi /root/raid1.sh

./storcli64 -cfgldadd r1[$1:1,$1:2,$1:3,$1:4,$1:5,$1:6,$1:7,$1:8,$1:9,$1:10,$1:11,$1:12,$1:13,$1:14,$1:15,$1:16,$1:17,$1:18,$1:19,$1:20,$1:21,$1:22,$1:23,$1:24] wb ra nocachedbadbbu strpsz1024 -a0

The script above requires an enclosure ID as a parameter.

9. Run the following command to get enclosure id.

#./storcli64 pdlist -a0 | grep Enc | grep -v 252 | awk '{print $4}' | sort | uniq -c | awk '{print $2}'

#chmod 755 raid1.sh

10. Run MegaCli script as follows:

#./raid1.sh <EnclosureID> obtained by running the command above

WB: Write back

RA: Read Ahead

NoCachedBadBBU: Do not write cache when the BBU is bad.

Strpsz1024: Strip Size of 1024K

Note: The command above will not override any existing configuration. To clear and reconfigure existing configurations refer to Embedded MegaRAID Software Users Guide available at www.lsi.com.

Configuring Data Drives on Data Nodes

This section describes steps to configure non-OS disk drives as individual RAID0 volumes using StorCli command as described below. These volumes are going to be used for HDFS Data.

1. Issue the following command from the admin node to create the virtual drives with individual RAID 0 configurations on all the data nodes.

#clush –w rhel[3-64] -B ./storcli64 -cfgeachdskraid0 WB RA direct NoCachedBadBBU strpsz1024 -a0

WB: Write back

RA: Read Ahead

NoCachedBadBBU: Do not write cache when the BBU is bad.

Strpsz1024: Strip Size of 1024K

Note: The command above will not override existing configurations. To clear and reconfigure existing configurations refer to Embedded MegaRAID Software Users Guide available at www.lsi.com.

Configuring the Filesystem for NameNodes and Datanodes

The following script will format and mount the available volumes on each node whether it is Namenode or Data node. OS boot partition is going to be skipped. All drives are going to be mounted based on their UUID as /data/disk1, /data/disk2, and so on.

1. On the Admin node, create a file containing the following script.

2. To create partition tables and file systems on the local disks supplied to each of the nodes, run the following script as the root user on each node.

Note: The script assumes there are no partitions already existing on the data volumes. If there are partitions, delete them before running the script. This process is documented in the "Note" section at the end of the section.

#vi /root/driveconf.sh

#!/bin/bash

#disks_count=`lsblk -id | grep sd | wc -l`

#if [ $disks_count -eq 24 ]; then

# echo "Found 24 disks"

#else

# echo "Found $disks_count disks. Expecting 24. Exiting.."

# exit 1

#fi

[[ "-x" == "${1}" ]] && set -x && set -v && shift 1

count=1

for X in /sys/class/scsi_host/host?/scan

echo '- - -' > ${X}

done

for X in /dev/sd?

echo "========"

echo $X

echo "========"

if [[ -b ${X} && `/sbin/parted -s ${X} print quit|/bin/grep -c boot` -ne 0

]]

then

echo "$X bootable - skipping."

continue

else

Y=${X##*/}1

echo "Formatting and Mounting Drive => ${X}"

/sbin/mkfs.xfs -f ${X}

(( $? )) && continue

#Identify UUID

UUID=`blkid ${X} | cut -d " " -f2 | cut -d "=" -f2 | sed 's/"//g'`

/bin/mkdir -p /data/disk${count}

(( $? )) && continue

echo "UUID of ${X} = ${UUID}, mounting ${X} using UUID on /data/disk${count}"

/bin/mount -t xfs -o inode64,noatime,nobarrier -U ${UUID} /data/disk${count}

(( $? )) && continue

echo "UUID=${UUID} /data/disk${count} xfs inode64,noatime,nobarrier 0 0" >> /etc/fstab

((count++))

done

3. Run the following command to copy driveconf.sh to all the nodes:

#chmod 755 /root/driveconf.sh

#clush –a -B –c /root/driveconf.sh

4. Run the following command from the admin node to run the script across all data nodes:

#clush –a –B /root/driveconf.sh

5. Run the following from the admin node to list the partitions and mount points:

#clush –a -B df –h

#clush –a -B mount

#clush –a -B cat /etc/fstab

Note: In-case there is a need to delete any partitions, it can be done so using the following.

6. Run the mount command (‘mount’) to identify which drive is mounted to which device /dev/sd<?>.

7. umount the drive for which partition is to be deleted and run fdisk to delete as shown below.

Note: Care should be taken not to delete the OS partition as this will wipe out the OS.

#mount

#umount /data/disk1 ç (disk1 shown as example)

#(echo d; echo w;) | sudo fdisk /dev/sd<?>

Cluster Verification

This section describes the steps to create the script cluster_verification.sh that helps to verify the CPU, memory, NIC, and storage adapter settings across the cluster on all nodes. This script also checks additional prerequisites such as NTP status, SELinux status, ulimit settings, JAVA_HOME settings and JDK version, IP address and hostname resolution, Linux version and firewall settings.

1. Create the script cluster_verification.sh as shown, on the Admin node (rhel1).

#vi cluster_verification.sh

#!/bin/bash

shopt -s expand_aliases,

# Setting Color codes

green='\e[0;32m'

red='\e[0;31m'

NC='\e[0m' # No Color

echo -e "${green} === Cisco UCS Integrated Infrastructure for Big Data and Analytics \ Cluster Verification === ${NC}"

echo ""

echo -e "${green} ==== System Information ==== ${NC}"

echo ""

echo -e "${green}System ${NC}"

clush -a -B " `which dmidecode` |grep -A2 '^System Information'"

echo ""

echo -e "${green}BIOS ${NC}"

clush -a -B " `which dmidecode` | grep -A3 '^BIOS I'"

echo ""

echo -e "${green}Memory ${NC}"

clush -a -B "cat /proc/meminfo | grep -i ^memt | uniq"

echo ""

echo -e "${green}Number of Dimms ${NC}"

clush -a -B "echo -n 'DIMM slots: '; `which dmidecode` |grep -c \ '^[[:space:]]*Locator:'"

clush -a -B "echo -n 'DIMM count is: '; `which dmidecode` | grep \ "Size"| grep -c "MB""

clush -a -B " `which dmidecode` | awk '/Memory Device$/,/^$/ {print}' |\ grep -e '^Mem' -e Size: -e Speed: -e Part | sort -u | grep -v -e 'NO \ DIMM' -e 'No Module Installed' -e Unknown"

echo ""

# probe for cpu info #

echo -e "${green}CPU ${NC}"

clush -a -B "grep '^model name' /proc/cpuinfo | sort -u"

echo ""

clush -a -B "`which lscpu` | grep -v -e op-mode -e ^Vendor -e family -e\ Model: -e Stepping: -e BogoMIPS -e Virtual -e ^Byte -e '^NUMA node(s)'"

echo ""

# probe for nic info #

echo -e "${green}NIC ${NC}"

clush -a -B "`which ifconfig` | egrep '(^e|^p)' | awk '{print \$1}' | \ xargs -l `which ethtool` | grep -e ^Settings -e Speed"

echo ""

clush -a -B "`which lspci` | grep -i ether"

echo ""

# probe for disk info #

echo -e "${green}Storage ${NC}"

clush -a -B "echo 'Storage Controller: '; `which lspci` | grep -i -e \ raid -e storage -e lsi"

echo ""

clush -a -B "dmesg | grep -i raid | grep -i scsi"

echo ""

clush -a -B "lsblk -id | awk '{print \$1,\$4}'|sort | nl"

echo ""

echo -e "${green} ================ Software ======================= ${NC}"

echo ""

echo -e "${green}Linux Release ${NC}"

clush -a -B "cat /etc/*release | uniq"

echo ""

echo -e "${green}Linux Version ${NC}"

clush -a -B "uname -srvm | fmt"

echo ""

echo -e "${green}Date ${NC}"

clush -a -B date

echo ""

echo -e "${green}NTP Status ${NC}"

clush -a -B "ntpstat 2>&1 | head -1"

echo ""

echo -e "${green}SELINUX ${NC}"

clush -a -B "echo -n 'SElinux status: '; grep ^SELINUX= \ /etc/selinux/config 2>&1"

echo ""

clush -a -B "echo -n 'CPUspeed Service: '; `which service` cpuspeed \ status 2>&1"

clush -a -B "echo -n 'CPUspeed Service: '; `which chkconfig` --list \ cpuspeed 2>&1"

echo ""

echo -e "${green}Java Version${NC}"

clush -a -B 'java -version 2>&1; echo JAVA_HOME is ${JAVA_HOME:-Not \ Defined!}'

echo ""

echo -e "${green}Hostname LoOKup${NC}"

clush -a -B " ip addr show"

echo ""

echo -e "${green}Open File Limit${NC}"

clush -a -B 'echo -n "Open file limit(should be >32K): "; ulimit -n'

2. Change Permissions to executable.

chmod 755 cluster_verification.sh

3. Run the Cluster Verification tool from the admin node. This can be run before starting Hadoop to identify any discrepancies in Post OS Configuration between the servers or during troubleshooting of any cluster / Hadoop issues.

#./cluster_verification.sh

Installing HDP 2.4

HDP is an enterprise grade, hardened Hadoop distribution. HDP combines Apache Hadoop and its related projects into a single tested and certified package. HPD 2.4 components are depicted in the figure below. The following section goes in to detail on how to install HDP 2.4 on the cluster.

Pre-Requisites for HDP

This section details the pre-requisites for HDP Installation such as setting up of HDP Repositories.

Hortonworks Repo

1. From a host connected to the Internet, download the Hortonworks repositories as shown below and transfer it to the admin node.

mkdir -p /tmp/Hortonworks

cd /tmp/Hortonworks/

2. Download the Hortonworks HDP Repo:

wget http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.2.0/HDP-2.4.2.0-centos7-rpm.tar.gz

3. Download Hortonworks HDP-Utils Repo:

wget http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos7/HDP-UTILS-1.1.0.20-centos7.tar.gz

4. Download Ambari Repo:

wget http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.2.2.0/ambari-2.2.2.0-centos7.tar.gz

5. Copy the repository directory to the admin node:

scp -r /tmp/Hortonworks/ rhel1:/var/www/html

6. Extract the files:

cd /var/www/html/Hortonworks

tar -zxvf HDP-2.4.2.0-centos7-rpm.tar.gz

tar -zxvf HDP-UTILS-1.1.0.20-centos7.tar.gz

tar -zxvf ambari-2.2.2.0-centos7.tar.gz

7. Create the hdp.repo file with following contents:

vi /etc/yum.repos.d/hdp.repo

[HDP-2.4.2.0]

name= Hortonworks Data Platform Version - HDP-2.4.2.0

baseurl= http://rhel1/Hortonworks/HDP/centos7/2.x/updates/2.4.2.0

gpgcheck=0

enabled=1

priority=1

[HDP-UTILS-1.1.0.20]

name=Hortonworks Data Platform Utils Version - HDP-UTILS-1.1.0.20

baseurl= http://rhel1/Hortonworks/HDP-UTILS-1.1.0.20/repos/centos7

gpgcheck=0

enabled=1

priority=1

8. Create the Ambari repo file with following contents:

vi /etc/yum.repos.d/ambari.repo

[Updates-ambari-2.2.2.0]

name=ambari-2.2.2.0 - Updates

baseurl=http://rhel1/Hortonworks/AMBARI-2.2.2.0/centos7/2.2.2.0-460

gpgcheck=0

enabled=1

priority=1

9. From the admin node copy the repo files to /etc/yum.repos.d/ of all the nodes of the cluster.

clush -a -b -c /etc/yum.repos.d/hdp.repo --dest=/etc/yum.repos.d/

clush -a -b -c /etc/yum.repos.d/ambari.repo --dest=/etc/yum.repos.d/

Downgrade Snappy on All Nodes

Downgrade snappy on all data nodes by running this command from admin node.

clush -a -b yum -y downgrade snappy

HDP Installation

To install HDP, complete the following the steps:

Install and Setup Ambari Server on rhel1

yum -y install ambari-server

External Database PostgreSQL Installation

The Postgresql database will be used by Ambari, Hive and Oozie services.

In this installation rhel2 will host the Hive and Oozie services and Ambari will be on rhel1.

1. Login to rhel2 and perform the following steps.

yum -y install postgresql-*

postgresql-setup initdb

/bin/systemctl start postgresql.service Or Service postgresql start

systemctl enable postgresql

chkconfig postgresql on

service postgresql status

2. Update these files on rhel2 in the location chosen to install the databases for Hive, Oozie and Ambari, using the host ip addresses.

vi /var/lib/pgsql/data/pg_hba.conf

vi /var/lib/pgsql/data/postgresql.conf

search for listen_address and replace with (*)

sudo -u postgres psql

http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_ambari_reference_guide/content/_using_ambari_with_postgresql.html

Ambari

To set up PostgreSQL to be used with Ambari, complete the following steps:

1. Create a user for Ambari and grant it permissions.

2. Using the PostgreSQL database admin utility:

3. Create the Ambari database.

4. Create an Ambari user with the password "Cisco_123".

5. Grant all privileges on the Ambari database to the Ambari user

\connect ambari

6. Create the Ambari schema authorization to Ambari user.

7. Change the Ambari schema owner to Ambari user.

8. Alter the Ambari role, set search_path to 'ambari', 'public'.

9. Log out using \q and log back in using this command:

[root@rhel2 ~]# service postgresql restart

[root@rhel2 ~]# psql -U ambari -d ambari

10. Please enter the password.

11. Load the Ambari Server database schema.

Pre-load the Ambari database schema into your PostgreSQL database using the schema script.

12. Find the Ambari-DDL-Postgres-CREATE.sql file in the /var/lib/ambari-server/resources/ directory of the Ambari Server host after you have installed Ambari Server.

13. Copy /var/lib/ambari-server/resources/ from rhel1 to rhel2:/tmp/.

Cd /tmp

# psql -U <AMBARIUSER> -d <AMBARIDATABASE>

\connect <AMBARIDATABASE>;

\i Ambari-DDL-Postgres-CREATE.sql;

14. Check the table is created by running \dt command.

15. Log out with \q.

16. Find the Ambari-DDL-Postgres-CREATE.sql file in the /var/lib/ambari-server/resources/ directory of the Ambari Server host after installing the Ambari Server.

[root@rhel2 ~]# service postgresql restart

For Hive

sudo -u postgres psql

1. Create the hive database.

2. Create a user hive with password "Cisco_123".

3. Grant all privileges on the hive database to the hive user.

Oozie

sudo -u postgres psql

1. Create the database oozie.

2. Create the user oozie with password "Cisco_123".

3. Grant all privileges on database oozie to oozie.

4. Connect to the admin node (rhel1) and run the commands described below.

yum -y install postgresql-jdbc*

ambari-server setup --jdbc-db=postgres --jdbc-driver=/usr/share/java/postgresql-jdbc.jar

Setting up the Ambari Server on the Admin Node (rhel1)

Ambary-server setup -j $JAVA_HOME

Note: Enter the advanced database configuration option – Yes and choose option 4 for the existing PostgreSQL Database selection.

Starting the Ambari Server

ambari-server start

Confirm Ambari Server Startup

ps –aef | grep ambari-server

Logging into the Ambari Server

Once the Ambari service has been started, access the Ambari Install Wizard through the browser.

1. Point the browser to http://<ip address for rhel1>:8080

The Ambari Login screen will open (Figure 83).

2. Log in to the Ambari Server using the default username/password: admin/admin. This can be changed at a later period of time.

Figure 83 Ambari Login Window

Once logged in, the “Welcome to Apache Ambari” window appears (Figure 84).

Figure 84 Welcome to Apache

$Description: C:\Users\matrived\Desktop\C240-M4\UCSM c240M4 CVD Screenshots\Snap 2015-03-06 at 18.30.24.png$

Creating a Cluster

To create a cluster, complete the following steps:

1. Click the Create a Cluster button to launch the install wizard as shown in Figure 84 above.

2. On the Get started page (Figure 85) type “Cisco_HDP” as the name for the cluster.

3. Click Next.

Figure 85 Creating a Cluster

Select a Stack

1. In the next screen (Figure 86), select the HDP 2.4 stack.

2. Expand “Advanced Repository Options”.

3. Under the advanced repository option:

4. Select the RedHat 7 checkbox.

5. Uncheck the rest of the checkboxes.

6. Update the Redhat 7 HDP-2.4 URL to http://rhel1/Hortonworks/HDP/centos7/2.x/updates/2.4.2.0.

7. Update the Redhat 7 HDP-UTILS-1.1.0.20 URL to http://rhel1/Hortonworks/HDP-UTILS-1.1.0.20/repos/centos7.

Figure 86 Select a Stack

Note: Make sure there are no trailing spaces after the URLs.

HDP Installation

To build up the cluster, the install wizard needs to know general information about how the cluster is to be set up. This requires providing the Fully Qualified Domain Name (FQDN) of each of the hosts. The wizard also needs to access the private key file that was created in Set Up Password-less SSH. It uses these to locate all the hosts in the system and to access and interact with them securely.

Figure 87 below shows the install wizard window.

1. Use the Target Hosts text box to enter the list of host names, one per line. Ranges inside brackets can also be used to indicate larger sets of hosts.

2. Select the option Provide your SSH Private Key in the Ambari cluster install wizard.

3. Copy the contents of the file /root/.ssh/id_rsa on rhel1 and paste it in the text area provided by the Ambari cluster install wizard.

Note: Make sure there is no extra white space after the text-----END RSA PRIVATE KEY-----

4. Click the Register and Confirm button to continue.

Figure 87 Install Options

Hostname Pattern Expressions

5. Click OK on the Host Name Pattern Expressions popup.

Figure 88 Host Name Pattern Expressions

Confirm Hosts

Figure 89 shows the Confirm Hosts screen This helps ensure that Ambari has located the correct hosts for the cluster and checks those hosts to make sure they have the correct directories, packages, and processes to continue the install.

1. If any host was selected in error, it can be removed by selecting the appropriate checkboxes and clicking the grey Remove Selected button.

2. To remove a single host, click the small white Remove button in the Action column.

3. When the lists of hosts are confirmed, click Next.

Figure 89 Confirm Hosts

Choose Services

HDP is made up of a number of components. See Understand the Basics for more information. The services are listed in Figure 90 below.

1. Select all to preselect all items.

2. When you have made your selections, click Next.

Figure 90 Choose Services

Assign Masters

The Ambari install wizard attempts to assign the master nodes for various services that have been selected to appropriate hosts in the cluster, as shown in Figure 91. The right column shows the current service assignments by host, with the hostname and its number of CPU cores and amount of RAM indicated.

Figure 91 Assign Masters

1. Reconfigure the service assignments to match Table 7 shown below.

Table 7 Reconfigure the service assignments

Service Name	Host
NameNode	rhel1
SNameNode	rhel2
History Server	rhel2
App Timeline Server	rhel2
Resource Manager	rhel2
Hive Metastore	rhel2
WebHCat Server	rhel2
HiveServer2	rhel2
HBase Master	rhel2
Oozie Server	rhel1
Zookeeper	rhel1, rhel2, rhel3
Falcon Server	rhel2
DRPC Server	rhel2
Nimbus	rhel2
Storm UI Server	rhel2

Spark History Server	rhel2
Accumulo Master	rhel2
Accumulo Monitor	rhel1
Accumulo Tracer	rhel1
SmartSense HST Server	rhel1
Grafana	rhel1
Kafka Broker	rhel1
Accumulo GC	rhel1
Atlas Metadata Server	rhel2
Knox Gateway	rhel1
Metrics Collector	rhel1

Note: On a small cluster (<16 nodes), consolidate all master services to run on a single node. For large clusters (> 64 nodes), deploy master services across 3 nodes.

2. Click Next.

Assign Slaves and Clients

The Ambari install wizard attempts to assign the slave components (DataNodes, NFSGateway, NodeManager, RegionServers, Phoenix Query Server, Supervisor, Flume, Accumulo TServer, Spark Thrift Server and Client) to appropriate hosts in the cluster as shown in Figure 92.

3. Reconfigure the service assignment to match the values shown in Table 8 below:

4. Assign DataNode, NodeManager, RegionServer, Supervisor and Flume on nodes rhel3- rhel64.

5. Assign Client to all nodes.

6. Click the Next button.

Table 8 Services and Hostnames

Client Service Name	Host
DataNode	rhel3-rhel64
NFSGateway	rhel1
NodeManager	rhel3-rhel64
RegionServer	rhel3-rhel64
Phoenix Query Server	rhel1
Supervisor	rhel3-rhel64
Flume	rhel3-rhel64
Accumulo TServer	rhel3-rhel64
Spark Thrift Server	rhel1
Client	All nodes, rhel1-rhel64

Figure 92 Assign Slaves and Clients

Customize Services

This section as shown in Figure 93 shows the tabs that manage configuration settings for Hadoop components. The wizard attempts to set reasonable defaults for each of the options here, but this can be modified to meet specific requirements. The following sections provide configuration guidance that should be refined to meet specific use case requirements.

The following changes are to be made:

· Memory and service level settings for each component and service level tuning.

· Customize the log locations of all the components to ensure growing logs do not cause the SSDs to run out of space.

HDFS

1. In Ambari, choose the HDFS Service tab and use the “Search” box on top to filter for the properties mentioned in Table 9 and update their values.

HDFS JVM Settings

1. Update the following HDFS configurations in Ambari.

Table 9 HDFS Configurations in Ambari

Property Name	Value
NameNode Java Heap Size	4096
Hadoop maximum Java heap size	4096
DataNode maximum Java heap size	4096
Datanode Volumes Failure Toleration	3

Update Log Directory

1. Change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1 (Figure 93).

Figure 93 Customize Services

MapReduce2

Figure 94 shows the MapReduce2 Tab.

1. In Ambari, choose the MapReduce Service tab and use the “Search” box on top to filter for the properties mentioned in Table 10 and update their values.

2. Update the following MapReduce configurations.

Table 10 MapReduce Configurations

Property Name	Value
Default virtual memory for a job's map-task	4096
Default virtual memory for a job's reduce-task	8192
Map-side sort buffer memory	1638
yarn.app.mapreduce.am.resource.mb	4096
mapreduce.map.java.opts	-Xmx3276m
mapreduce.reduce.java.opts	-Xmx6552m
yarn.app.mapreduce.am.command-opts	-Xmx6552m

3. Under the MapReduce2 tab, change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1.

Figure 94 MapReduce2 Tab

Figure 95 MapReduce2 Tab

YARN

1. In Ambari, choose the YARN Service from the tab as shown in Figure 96, and use the “Search” box on top to filter for the properties mentioned in Table 11 below to update their values.

2. Update the following YARN configurations.

Table 11 YARN Configuration Values

Property Name	Value
ResourceManager Java heap size	4096
NodeManager Java heap size	2048
yarn.nodemanager.resource.memory-mb	184320
YARN Java heap size	4096
yarn.scheduler.minimum-allocation-mb	4096
yarn.scheduler.maximum-allocation-mb	184320

3. Under YARN tab, change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1.

Figure 96 YARN Tab

Tez

No changes are required.

Figure 97 Tez Tab

Hive

Choose Hive Service from the tab, as shown in Figure 98. Select the advanced tab and make the changes below:

1. Select Existing PostgreSQL Database.

2. Enter Database Host to 10.4.1.32.

3. Database Name hive.

4. Database Username hive.

5. Enter the Hive database password as per organizational policy.

6. Database password Cisco_123.

7. Please test connection.

8. Change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1.

9. Change the WebHCat log directory by finding the Log Dir property and modifying the /var prefix to /data/disk1.

Figure 98 Hive Tab

HBase

In Ambari, choose HBASE Service from the tab (Figure 99) and use the “Search” box on top to filter for the properties mentioned in Table 12 to update their values.

1. Update the following HBASE configurations:

Table 12 HBASE Configurations

Property Name	Value
HBase Master Maximum Java Heap Size	4096
HBase RegionServers Maximum Java Heap Size	16384

Note: If you are not running HBase, keep the default value of 1024 for Java Heap size for HBase RegionServers and HBase Master

2. Under the HBase tab, change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1.

Figure 99 HBase

Pig

No changes are required.

Figure 100 Pig Services

Sqoop

No changes are required.

Figure 101 Scoop Services

Oozie

Similarly, under the Oozie tab, (Figure 102), change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1.

1. Select Existing PostgreSQL Database.

2. Enter Database Host to 10.4.1.32.

3. Database Name oozie.

4. Database Username oozie. Enter the oozie database password as per organizational policy.

5. Database password is Cisco_123.

6. Please test the connection.

Figure 102 Oozie Tab

Zookeeper

Under the Zookeeper tab, (Figure 103), change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1.

Figure 103 ZooKeeper Tab

Falcon

Under the Falcon tab, (Figure 104), change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1.

Figure 104 Falcon Tab

Storm

1. Under the Storm tab, (Figure 105), change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1.

Figure 105 Storm Tab

Ambari Metrics

1. Choose the Ambari Metrics Service, (Figure 106), from the tab and expand the general tab and make the changes below:

2. Enter the Grafana Admin password as per organizational policy.

3. Change the default log location for Metrics Collector, Metrics Monitor and Metrics Grafana by finding the Log Dir property and modifying the /var prefix to /data/disk1.

4. Change the default data dir location for Metrics Grafana by finding the data Dir property and modifying the /var prefix to /data/disk1.

Figure 106 Ambari Metrics

Flume

Under the Flume tab, change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1.

Figure 107 Flume Tab

Accumulo

Choose Accumulo Service (Figure 108), from the tab and expand the general tab and make the changes below:

1. Enter the Accumulo root password as per organizational policy.

2. Enter the Accumulo instance Secret password as per organizational policy.

3. change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1.

Figure 108 Accumulo Service

Atlas

Under the Atlas tab, (Figure 109), change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1.

Figure 109 Atlas Tab

Kafka

1. Under the Kafka tab, change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1.

Figure 110 Kafka Tab

Knox

1. Choose Knox Service, (Figure 111), from the tab and expand the Knox gateway tab and make the changes below:

2. Enter the Knox Master Secret password as per organizational policy.

3. For Knox, change the gateway port to 8444 to ensure no conflicts with local HTTP server.

Figure 111 Knox Service

Mahout

No changes are required.

Figure 112 Mahout Tab

Slider

No changes are required.

Figure 113 Slider

SmartSense

Figure 114 shows the SmartSense tab. This requires the Hortonworks support subscription. Subscribers can populate the properties below.

Figure 114 SmartSense

Spark

1. Select the Spark tab, (Figure 115), change the default log location by finding the Log Dir property and modifying the /var prefix to /data/disk1.

Figure 115 Spark Tab

Misc

No changes are required.

Figure 116 Misc Tab

Review

The assignments that have been made are displayed, (Figure 117). Check to ensure everything is correct before clicking on the Deploy button. If any changes are to be made, use the left navigation bar to return to the appropriate screen.

Deploy

1. Once the review is complete, click the Deploy button.

Figure 117 Review the Configuration

The progress of the install is shown on the screen as shown in Figure 118. Each component is installed and started and a simple test is run on the component. The next screen displays the overall status of the install in the progress bar at the top of the screen and a host-by-host status in the main section.

2. To see specific information on what tasks have been completed per host, click the link in the Message column for the appropriate host.

3. In the Tasks pop-up, click the individual task to see the related log files.

4. Select filter conditions by using the Show dropdown list.

5. To see a larger version of the log contents, click the Open icon or to copy the contents to the clipboard, use the Copy icon.

Depending on which components are installing, the entire process may take 10 or more minutes.

6. When successfully installed and started the service appears, click Next.

Figure 118 displays the install progress screen.

Figure 118 Install Progress Screen

Summary of the Installation Process

Figure 119 shows a summary of the accomplished tasks.

7. Click Complete.

Figure 119 Summary Screen

Installation of HDF

Cisco UCS Integrated Infrastructure for Big Data and Analytics offers several configurations to meet a variety of computing and storage requirements for HDF. Ideally these include the Cisco UCS C220 M4 servers with the following configuration shown in Figure 120 and Table 13 below.

Figure 120 Reference Architecture for Spark Streaming with HDF & HDP

Table 13 Cisco UCS Reference Architecture for HDF

Starter	High Performance
8 Cisco UCS C220 M4 Rack Servers, each with: 2 Intel Xeon processor E5- 2620 v4 CPUs (8 cores each) 128 GB of memory 8 x 1.2-TB 10,000-RPM SFF SAS drives Total of 10 TB of storage capacity 1.4 GBps of I/O bandwidth Cisco UCS VIC 1227 Cisco 12-Gbps SAS Modular RAID Controller with 2-GB FBW	8 Cisco UCS C220 M4 Rack Servers, each with: 2 Intel Xeon processor E5- 2680 v4 CPUs (14 cores each) 256 GB of memory 8 x 960-GB SFF SSDs Total of 7.5 TB of flash storage 4 GBps of I/O bandwidth Cisco UCS VIC 1227 Cisco 12-Gbps SAS Modular RAID Controller with 2-GB FBWC

Starter

High Performance

8 Cisco UCS C220 M4 Rack Servers, each with:

2 Intel Xeon processor E5- 2620 v4 CPUs (8 cores each)

128 GB of memory

8 x 1.2-TB 10,000-RPM SFF SAS drives

Total of 10 TB of storage capacity

1.4 GBps of I/O bandwidth

Cisco UCS VIC 1227

Cisco 12-Gbps SAS Modular RAID Controller with 2-GB FBW

8 Cisco UCS C220 M4 Rack Servers, each with:

2 Intel Xeon processor E5- 2680 v4 CPUs (14 cores each)

256 GB of memory

8 x 960-GB SFF SSDs

Total of 7.5 TB of flash storage

4 GBps of I/O bandwidth

Cisco UCS VIC 1227

Cisco 12-Gbps SAS Modular RAID Controller with 2-GB FBWC

Configuring Disk Drives for the Operating System on HDF Nodes

This section describes in detail the RAID configuration of disk drives for OS on HDF nodes. The first two disk drives are configured as RAID1, read ahead cache is enabled and write cache is enabled while battery is present and as OS partition.

The remaining disk drives, 6 (in case of Cisco UCS C220 M4) are used for data and is configured as 3 two disk RAID1 volumes with read ahead cache is enabled and write cache is enabled while battery is present is described in the next section.

To configure Disk Drives for Operating System on Cisco UCS C220 M4 servers, complete the following steps.

1. Log in to the Cisco UCS 6296 Fabric Interconnect and launch the Cisco UCS Manager application.

2. Select the Equipment tab as in Figure 121.

3. In the navigation pane expand Rack-Mounts and then Servers.

4. Right click on the server and select KVM Console.

Figure 121 KVM Console

5. Restart the server by using KVM Console, Macros > Static Macros > Ctrl-Alt-Del as shown in Figure 122.

Figure 122 Restart The Server

$Description: C:\Users\asmanjun\AppData\Local\Temp\SNAGHTML54cdda.PNG$

6. Press <Ctrl>R to enter the Cisco SAS Modular Raid Controller BIOS Configuration Utility. This is shown in Figure 123 below.

7. Select the controller and press F2.

Clear the Configuration if previous configurations are present.

8. Select Create Virtual Drive.

Figure 123 Cisco SAS Modular Raid Controller BIOS Configuration Utility

$Description: C:\Users\jregula\Desktop\screenshots\HaaS Screenshots DN1\Snap 2015-09-11 at 15.41.22.png$

9. Select RAID level RAID-1, select the first two drives and choose Advanced.

Figure 124 Cisco SAS Modular Raid Controller BIOS Configuration Utility

$Description: C:\Users\jregula\Desktop\screenshots\HaaS Screenshots DN1\Snap 2015-09-11 at 15.42.50.png$

10. Select the following:

Strip Size is 64KB.

Read Policy is Normal.

Write Policy is Write Through.

I/O Policy is Direct.

Disk Cache Policy is unchanged.

Emulation is Default.

11. Select Initialize.

Figure 125 Create Virtual Drive

$Description: C:\Users\jregula\Desktop\screenshots\HaaS Screenshot CN1\New folder\Snap 2015-09-11 at 12.42.35.png$

Initialization will destroy data on the virtual drives.

12. Select OK to continue.

Figure 126 Cisco SAS Modular Raid Controller BIOS Configuration Utility

Figure 127 Cisco SAS Modular Raid Controller BIOS Configuration Utility

$Description: C:\Users\jregula\Desktop\screenshots\HaaS Screenshots DN1\Snap 2015-09-11 at 15.45.44.png$

13. Select OK to continue.

14. Press the ESC key, then select OK to exit the utility.

15. Press Ctrl-N to move to the Ctrl Mgmt tab.

16. Go to the Boot Device option and select the VD 0 930.39 GB for C220 M4.

17. Apply the settings and exit.

Configuring Disk Drives for Data on HDF Nodes

The remaining 6 disk drives are used for data and are configured as 3 two disk RAID1 volumes with read ahead cache is enabled and write back cache is enabled while battery is present is described in the next section.

1. Repeat the steps above 3 times to create 3 RAID1 volumes of two drives each.

2. Select the following:

Strip Size is 64KB.

Read Policy is Normal.

Write Policy is Write Back.

I/O Policy is Direct.

Disk Cache Policy is unchanged.

Emulation is Default.

The three separate volumes are needed for

· Flowfile

· Content Repository

· Provenance Repository

Installing OS on HDF Nodes

Follow the same steps to install RHEL 7.2 on the boot drives as shown in the C240 M4 server section.

Configuring the Filesystem for HDF Data Disks.

The following script will format and mount the available volumes on each node. OS boot partition is going to be skipped. All drives are going to be mounted based on their UUID as flowfile_repo, cont_repo, and prov_repo.

1. On the Admin node, create a file containing the following script.

2. To create partition tables and file systems on the local disks supplied to each of the nodes, run the following script as the root user on each node.

#vi /root/driveconf_nifi.sh

#!/bin/bash

#disks_count=`lsblk -id | grep sd | wc -l`

#if [ $disks_count -eq 24 ]; then

# echo "Found 24 disks"

#else

# echo "Found $disks_count disks. Expecting 24. Exiting.."

# exit 1

#fi

[[ "-x" == "${1}" ]] && set -x && set -v && shift 1

count=1

for X in /sys/class/scsi_host/host?/scan

echo '- - -' > ${X}

done

for X in /dev/sd?

echo "========"

echo $X

echo "========"

if [[ -b ${X} && `/sbin/parted -s ${X} print quit|/bin/grep -c boot` -ne 0

]]

then

echo "$X bootable - skipping."

continue

else

Y=${X##*/}1

echo "Formatting and Mounting Drive => ${X}"

/sbin/mkfs.xfs -f ${X}

(( $? )) && continue

#Identify UUID

UUID=`blkid ${X} | cut -d " " -f2 | cut -d "=" -f2 | sed 's/"//g'`

if [[ $count -eq 1 ]]; then

folder="flowfile_repo";

elif [[ $count -eq 2 ]]; then

folder="cont_repo";

else

folder="prov_repo";

echo ${folder}

/bin/mkdir -p /${folder}

(( $? )) && continue

echo "UUID of ${X} = ${UUID}, mounting ${X} using UUID on /${folder}"

/bin/mount -t xfs -o inode64,noatime -U ${UUID} /${folder}

(( $? )) && continue

echo "UUID=${UUID} /${folder} xfs inode64,noatime 0 0" >> /etc/fstab

((count++))

done

3. Run the following command to copy driveconf_nifi.sh to all the nodes:

#chmod 755 /root/driveconf_nifi.sh

#clush –a -B –c /root/driveconf_nifi.sh

4. Run the following command from the admin node to run the script across all data nodes

#clush –a –B /root/driveconf_nifi.sh

5. Run the following from the admin node to list the partitions and mount points

#clush –a -B df –h

#clush –a -B mount

#clush –a -B cat /etc/fstab

Post OS configuration of HDF Nodes

Please follow the similar steps as HDP nodes post OS configuration mentioned above with additional steps as described below.

Note: For more information please check https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices

Maximum File Handles

NiFi will at any one time potentially have a very large number of file handles open. Increase the limits by editing /etc/security/limits.conf to add something like:

* hard nofile 50000

* soft nofile 50000

Maximum Forked Processes

NiFi may be configured to generate a significant number of threads.

1. To increase the allowable number, edit /etc/security/limits.conf.

* hard nproc 10000

* soft nproc 10000

2. The distribution may require an edit to /etc/security/limits.d/90-nproc.conf by adding:

* soft nproc 10000

3. Increase the number of TCP socket ports available. This is particularly important if the flow will be setting up and tearing down a large number of sockets in a small period of time.

sudo sysctl -w net.ipv4.ip_local_port_range="10000 65000"

4. Set how long sockets stay in a TIMED_WAIT state when closed. Don’t let the sockets sit and linger too long. The flow will be setting up and tearing down a large number of sockets in a small period of time. Read more about it, but to adjust, try the following:

sudo sysctl -w net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait="1"

5. Set Linux never to NiFi swap.

Swapping is fantastic for some applications. It is not good for NiFi that always wants to be running.

6. To turn swapping off, edit /etc/sysctl.conf and add the following line:

vm.swappiness = 0

Setting up HDF

1. On an Internet connected node, create directory called HDF.

mkdir –p /tmp/hdf

cd /tmp/hdf

We are using site to site connectivity for this exercise.

Node 1- distribution node (Internet connectivity)

Node 2-7 – HDF cluster

2. Based on Post OS settings, have the clustershell installed on a Distribution node (edge node) to run the clush command.

To setup HDF, complete the following steps:

1. Running as root, download HDF-1.2.0.1-1.zip using wget below command.

wget http://public-repo-1.hortonworks.com/HDF/centos6/1.x/updates/1.2.0.1/HDF-1.2.0.1-1.zip

2. Copy HDF zip file to /root/distribution on Distribution node and unzip.

mkdir distribution

$Description: C:\Users\matrived\Desktop\CVD & License\CPAv4\Hortonworks\HDF-June-30\Snap 2016-06-30 at 08.40.31.png$

3. Create user nifi on all nodes (distribution and HDF nodes).

clush -a -b adduser nifi

4. Open the following ports on the firewall using these commands.

firewall-cmd --zone=public --add-port=80/tcp --permanent

firewall-cmd --zone=public --add-port=8080/tcp --permanent

firewall-cmd --zone=public --add-port=9090/tcp --permanent

firewall-cmd --zone=public --add-port=9080/tcp --permanent

firewall-cmd --zone=public --add-port=9088/tcp --permanent

firewall-cmd --zone=public --add-port=7088/tcp --permanent

firewall-cmd --zone=public --add-port=6088/tcp --permanent

firewall-cmd --reload

5. From the distribution node run, create the following directories needed for HDF.

[root@hdf ~]# clush -a -b mkdir -p /opt/configuration-resources

[root@hdf ~]# clush -a -b mkdir -p /opt/database_repo

[root@hdf ~]# clush -a -b mkdir -p /nifi-log

[root@hdf ~]# clush -a -b chown -R nifi:nifi /opt/database_repo

[root@hdf ~]# clush -a -b chown -R nifi:nifi /opt/configuration-resources

[root@hdf ~]# clush -a -b chown -R nifi:nifi /nifi-log

6. Copy the files from the distribution node to all other nodes as follows:

cd /home/nifi/distribution/HDF-1.2.0.1-1/nifi/conf

[root@hdf conf]# clush -a -b -c authority-providers.xml --dest=/opt/configuration-resources

[root@hdf conf]# clush -a -b -c authorized-users.xml --dest=/opt/configuration-resources

[root@hdf conf]# clush -a -b -c bootstrap-notification-services.xml --dest=/opt/configuration-resources

[root@hdf conf]# clush -a -b -c login-identity-providers.xml --dest=/opt/configuration-resources

[root@hdf conf]# clush -a -b -c state-management.xml --dest=/opt/configuration-resources

[root@hdf conf]# clush -a -b -c zookeeper.properties --dest=/opt/configuration-resources

[root@hdf conf]# clush -a -b chown -R nifi:nifi /opt/configuration-resources/*

[root@hdf conf]# clush -a -b ls -l /opt/configuration-resources/

7. Create the following directories needed for HDF as follows:

[root@hdf conf]# clush -a -b mkdir /opt/configuration-resources/templates

[root@hdf conf]# clush -a -b mkdir /opt/configuration-resources/archive

[root@hdf conf]# clush -a -b chown -R nifi:nifi /opt/configuration-resources/*

8. On the distribution-node, delete the following files in the nifi conf directory:

cd /home/nifi/distribution/HDF-1.2.0.1-1/nifi/conf

[root@hdf conf]# rm -f authority-providers.xml authorized-users.xml bootstrap-notification-services.xml login-identity-providers.xml state-management.xml

zookeeper.properties

[root@hdf conf]# ls

bootstrap.conf , logback.xml, nifi.properties

[nifi@hdf conf]$ pwd

9. Modify Nifi.properties.

10. Update file nifi.properties with settings based on the directories created above.

cd /home/nifi/distribution/HDF-1.2.0.1-1/nifi/conf

vi nifi.properties

11. Modify file/directory location for the following as shown in the screenshots.

/conf in nifi.properties are updated to /opt/configuration-resources

12. Update the flowfile repo as /flowfile_repo.

13. Update the content repo directory as /cont_repo.

14. Set the http host to Change_host, this will be updated later as a script. Update the port number.

15. Set the http host to Change_host, this will be updated later as a script. Update the port number.

Modify Bootstrap.conf

16. Update bootstrap.conf as shown in the screenshot.

Modify logback.xml

1. Modify logback.xml to point to the right log file as shown below:

<file>logs/nifi-app.log</file> to

2. Search for <fileNamePattern> and change.log to .log.gz.

$Description: C:\Users\matrived\AppData\Local\Microsoft\Windows\Temporary Internet Files\Content.Word\Snap 2016-07-20 at 14.48.31.png$

3. Now copy the entire distribution directory to all the nifi nodes under the same location.

[root@hdf distribution]# clush -a -b -c HDF-1.2.0.1-1 --dest=/opt/

[root@hdf distribution]# clush -a -b ls /opt/

4. Change owner on all files in /opt/HDF-<x> to nifi:nifi as shown below

[root@hdf distribution]# clush -a -b chown -R nifi:nifi /opt/HDF-1.2.0.1-1

5. Change owner on all the following mount points as well

6. Run df –h command on all nodes to check mount points

[root@hdf ~]# clush -a -b chown -R nifi:nifi /flowfile_repo/

[root@hdf ~]# clush -a -b chown -R nifi:nifi /prov_repo/

[root@hdf ~]# clush -a -b chown -R nifi:nifi /cont_repo/

7. Write this script to make changes in nifi.properties on all nodes:

[root@hdf ~]# vi nifi-changes.sh

host=`hostname -f`

echo $host

sed -i.bak "s/change_host/$host/g" /opt/HDF-1.2.0.1-1/nifi/conf/nifi.properties

[root@hdf ~]# clush -a -b -c ./nifi-changes.sh

[root@hdf ~]# clush -a -b ./nifi-changes.sh

8. Run this command to identify changes are applied

[root@hdf ~]# clush -a -b "diff /opt/HDF-1.2.0.1-1/nifi/conf/nifi.*"

Starting HDF

There are 3 types of Nifi nodes:

1. Standalone - distribution node ( internet connectivity) Edge node

2. Master

3. Slave

To bring up each of these nodes, complete the following steps:

Starting HDF on the distribution-node

The Standalone system is ready, no changes are required.

1. Start the service on the standalone system:

[root@hdf ~]$ /opt/HDF-1.2.0.1-1/nifi/bin/nifi.sh start

Starting HDF on Master node

1. Connect to the master node and perform the following steps:

2. In nifi properties, update manager to true:

vi nifi.properites

nifi.cluster.is.manager = true

3. Start the service on the master system:

[root@hdf ~]$ /opt/HDF-1.2.0.1-1/nifi/bin/nifi.sh start

Starting HDF on Slave node

1. In nifi properties, update node to true to make this a slave node

nifi.cluster.is.node = true

2. Run the following command on all slave nodes:

sed -i.bak "s/nifi.cluster.is.node=false/nifi.cluster.is.node=true/g" /opt/HDF-1.2.0.1-1/nifi/conf/nifi.properties

3. Start service on slave nodes:

[root@hdf ~]$ /opt/HDF-1.2.0.1-1/nifi/bin/nifi.sh start

At this point the NiFi cluster should be up and running.

Bill of Materials

This section provides the BOM for the 64 nodes Performance Optimized Cluster. See Table 14 for BOM for the master rack, Table 15 for BOM for expansion racks (racks 2 to 4), Table 16 and Table 17 for software components. Table 18 lists Hortonworks Data Platform (HDP) SKUs available from Cisco. Table 19 and Table 20 list Hortonworks Dataflow Starter Configuration SKUs.

If UCSD-SL-CPA4-P2 is added to the BOM all the required components for 16 servers only are automatically added. If not customers can pick each of the individual components that are specified after this and build the BOM manually.

Table 14 Bill of Materials for C240M4SX Base Rack

Part Number	Description	Quantity
UCS-SL-CPA4-P2	Performance Optimized Option 2 Cluster	1
UCSC-C240-M4SX	UCS C240 M4 SFF 24 HD w/o CPU, memory, HD, PCIe, PS, rail kit w/expander	16
UCSC-MRAID12G	Cisco 12G SAS Modular Raid Controller	16
UCSC-MRAID12G-2GB	Cisco 12Gbps SAS 2GB FBWC Cache module (Raid 0/1/5/6)	16
UCSC-MLOM-CSC-02	Cisco UCS VIC1227 VIC MLOM - Dual Port 10Gb SFP+	16
CAB-9K12A-NA	Power Cord 125VAC 13A NEMA 5-15 Plug North America	32
UCSC-PSU2V2-1200W	1200W/800W V2 AC Power Supply for 2U C-Series Servers	32
UCSC-RAILB-M4	Ball Bearing Rail Kit for C240 M4 rack servers	16
UCSC-HS-C240M4	Heat Sink for UCS C240 M4 Rack Server	32
UCSC-SCCBL240	Supercap cable 250mm	16
UCS-CPU-E52680E	2.40 GHz E5-2680 v4/120W 14C/35MB Cache/DDR4 2400MHz	32
UCS-MR-1X161RV-A	16GB DDR4-2400-MHz RDIMM/PC4-19200/single rank/x4/1.2v	256
UCS-HD18TB10KS4K	1.8 TB 12G SAS 10K rpm SFF HDD (4K)	384
UCS-SD240GBKS4-EB	240 GB 2.5 inch Enterprise Value 6G SATA SSD (BOOT)	32
UCSC-PCI-1C-240M4	Right PCI Riser Bd (Riser 1) 2onbd SATA bootdrvs+ 2PCI slts	16
UCS-FI-6296UP-UPG	UCS 6296UP 2RU Fabric Int/No PSU/48 UP/ 18p LIC	2
CON-SNT-FI6296UP	SMARTNET 8X5XNBD UCS 6296UP 2RU Fabric Int/2 PSU/4 Fans	2
SFP-H10GB-CU3M	10GBASE-CU SFP+ Cable 3 Meter	34
UCS-ACC-6296UP	UCS 6296UP Chassis Accessory Kit	2
UCS-PSU-6296UP-AC	UCS 6296UP Power Supply/100-240VAC	4
N10-MGT014	UCS Manager v3.1	2
UCS-L-6200-10G-C	2rd Gen FI License to connect C-direct only	62
UCS-BLKE-6200	UCS 6200 Series Expansion Module Blank	6
UCS-FAN-6296UP	UCS 6296UP Fan Module	8
CAB-N5K6A-NA	Power Cord 200/240V 6A North America	4
UCS-FI-E16UP	UCS 6200 16-port Expansion module/16 UP/ 8p LIC	4
RACK-UCS2	Cisco R42610 standard rack w/side panels	1
RP208-30-1P-U-2=	Cisco RP208-30-U-2 Single Phase PDU 20x C13 4x C19 (Country Specific)	2
CON-UCW3-RPDUX	UC PLUS 24X7X4 Cisco RP208-30-U-X Single Phase PDU 2x (Country Specific)	6

If using the FI 6332 please refer to Table 1 for the SKU information.

Table 15 Bill of Materials for Expansion Racks

Part Number	Description	Quantity
UCSC-C240-M4SX	UCS C240 M4 SFF 24 HD w/o CPU, mem, HD, PCIe, PS, railkt w/expndr	48
UCSC-MRAID12G	Cisco 12G SAS Modular Raid Controller	48
UCSC-MRAID12G-2GB	Cisco 12Gbps SAS 2GB FBWC Cache module (Raid 0/1/5/6)	48
UCSC-MLOM-CSC-02	Cisco UCS VIC1227 VIC MLOM - Dual Port 10Gb SFP+	48
CAB-9K12A-NA	Power Cord 125VAC 13A NEMA 5-15 Plug North America	96
UCSC-PSU2V2-1200W	1200W V2 AC Power Supply for 2U C-Series Servers	96
UCSC-RAILB-M4	Ball Bearing Rail Kit for C240 M4 rack servers	48
UCSC-HS-C240M4	Heat Sink for UCS C240 M4 Rack Server	96
UCSC-SCCBL240	Supercap cable 250mm	48
UCS-CPU-E52680E	2.40 GHz E5-2680 v4/120W 14C/35MB Cache/DDR4 2400MHz	96
UCS-MR-1X161RV-A	16GB DDR4-2400-MHz RDIMM/PC4-19200/single rank/x4/1.2v	768
UCS-HD18TB10KS4K	1.8 TB 12G SAS 10K rpm SFF HDD (4K)	1152
UCS-SD240GBKS4-EB	240 GB 2.5 inch Enterprise Value 6G SATA SSD (BOOT)	96
UCSC-PCI-1C-240M4	Right PCI Riser Bd (Riser 1) 2onbd SATA boot drvs+ 2PCI slts	48
SFP-H10GB-CU3M=	10GBASE-CU SFP+ Cable 3 Meter	96
RACK-UCS2	Cisco R42610 standard rack w/side panels	3
RP208-30-1P-U-2=	Cisco RP208-30-U-2 Single Phase PDU 20x C13 4x C19 (Country Specific)	6
CON-UCW3-RPDUX	UC PLUS 24X7X4 Cisco RP208-30-U-X Single Phase PDU 2x (Country Specific)	18

Table 16 Red Hat Enterprise Linux License

Red Hat Enterprise Linux
RHEL-2S2V-3A	Red Hat Enterprise Linux	64
CON-ISV1-EL2S2V3A	3 year Support for Red Hat Enterprise Linux	64

Table 17 SKUS for Hortonworks Subscription

Cisco PID (TOP level)	Cisco Subscription PID	Description
UCS-BD-HDP-JSS=	UCS-BD-HDP-JSS-6M	HDP Data Platform Jumpstart Subscription - Up to 16 Nodes – 1 Business Day Response - 6 Months - sold to new customers only - max. Qty. to buy 1 SKU per customer
UCS-BD-HDP-ENT-ND=	UCS-BD-ENT-ND-1Y	HDP Enterprise Subscription - 4 Nodes - 24x7 Sev 1 Response - 1 Year - min of 3 SKUs required for new customers
UCS-BD-HDP-ENT-ND=	UCS-BD-ENT-ND-2Y	HDP Enterprise Subscription - 4 Nodes - 24x7 Sev 1 Response - 2 Year - min of 3 SKUs required for new customers
UCS-BD-HDP-ENT-ND=	UCS-BD-ENT-ND-3Y	HDP Enterprise Subscription - 4 Nodes - 24x7 Sev 1 Response - 3 Year - min of 3 SKUs required for new customers
UCS-BD-HDP-EPL-ND=	UCS-BD-EPL-ND-1Y	HDP Enterprise Plus Subscription - 4 Nodes - 24x7 Sev 1 Response - 1 Year - min of 3 SKUs required for new customers
UCS-BD-HDP-EPL-ND=	UCS-BD-EPL-ND-2Y	HDP Enterprise Plus Subscription - 4 Nodes - 24x7 Sev 1 Response - 2 Year - min of 3 SKUs required for new customers
UCS-BD-HDP-EPL-ND=	UCS-BD-EPL-ND-3Y	HDP Enterprise Plus Subscription - 4 Nodes - 24x7 Sev 1 Response - 3 Year - min of 3 SKUs required for new customers

Table 18 Hortonworks SKUs

Hortonworks SKU	Description
HDF-JSS-6M	Hortonworks DataFlow Jumpstart Subscription - 32 Cores and 10 Edge Devices – Business Day Response - 6 Months
HDF-ENT-1Y	Hortonworks DataFlow Enterprise Subscription - 16 Cores – 24x7 Sev 1 Response - 1 Year
HDF-EDG-1Y	Hortonworks DataFlow Edge Pack Subscription - 100 Edge Devices – 24x7 Sev 1 Response - 1 Year

Note: For the Hortonworks Dataflow, the following SKU’s are available, each with 8 x Cisco UCS C220M4 in the configuration.

1. Starter Configuration for test and development, and smaller deployments.

Table 19 Hortonworks Dataflow Starter Configuration SKUs

Base Solution SKU	UCS-SL-CPA4-S
Server SKU (C220 M4)	UCS-SPBD-C220M4-S1

2. High Performance Configuration with SSDs.

Table 20 Hortonworks Dataflow High Performance Configuration

Base Solution SKU	UCS-SL-CPA4-H
Server SKU (C220 M4)	UCS-SPBD-C220M4-H1

About the Authors

Manan Trivedi is a Big Data Solutions Architect in the Data Center Solutions Group, Cisco Systems Inc. Manan is part of the Big Data solution engineering team focusing on big data infrastructure and performance.

Ali Bajwa, Principal Partner Solutions Engineer, Technology Alliances Team, Hortonworks, Inc.
Ali is a Senior Partner Solutions Engineer at Hortonworks and works as part of the Technology Alliances team. His focus is to evangelize and assist partners integrate with Hortonworks Data Platform.

Acknowledgements

· Karthik Kulkarni, Big Data Solutions Architect, Data Center Solutions Group, Cisco Systems Inc.

· Barbara Dixon, Technical Writer, Data Center Solutions Group, Cisco Systems, Inc.

Appendix A- Network Bonding

Setting Up Network Bonding

Network bonding can be setup on the vNICs on the hosts for redundancy as well as for increased throughput. Below is example of how bonding can be configured on a 16-node cluster. See Figure 128 below.

Figure 128 vNIC Bonding Set Up diagram

Enabling bonding will need the Fabric Interconnects to connect to any L2/L3 switch in this example the N9K-9372PX switch is used. All the ports that will be used will need to be bundled in a port channel. In this example we bundled 8 ports into the port channel. In this configuration 3 vNICs are used on each FI, while one vNIC on each is sufficient the reason for this is multiple vNIC bonding works much better with bonding mode 6, because both sending and receiving frames are load balanced with different MAC addresses.

By the very definition of bond mode 6 (balance-alb),“when a link is reconnected or a new slave joins the bond the received traffic is redistributed among all active slaves in the bond by initiating ARP Replies with the selected MAC address to each of the clients.” This explains the performance improvement that is observed with addition of multiple vNICS.

VLANs are configured as shown in Table 21 below:

Table 21 VLan Configurations

VLAN	Fabric	NIC Port	Function
Vlan19		bond0	Data
	A	eth0	Data
	B	eth1	Data
	A	eth2	Data
	B	eth3	Data
	A	eth4	Data
	B	eth5	Data

All NICs are carrying data traffic in Vlan19.

$Description: C:\Users\matrived\Desktop\32 server audit\Snap-2015-10-15-09-39-15.png$

16 upstream ports bundled in Port channel created between Fabric Interconnect and upstream switch (Nexus N9K-C9372PX)

1. Configure eth0 as shown below in Figure 129, and click Apply to enable the changes.

Figure 129 Configure eth0

All vNICs (eth0-eth5) are the same as the above configuration. The only change will be the Fabric ID as noted below:

· eth0, eth2, and eth4 are on the link going to Fabric A

· eth1, eth3, and eth5 are on the link going to Fabric B

· All 6 vNICs are bonded together in Mode 6 to form bond0 interface.

To configure bonding on the OS Host machines, complete the following steps:

1. Configure /etc/modules.conf as follows:

alias bond0 bonding

Sample ifcfg-eth0 file

DEVICE="eth0"

BOOTPROTO="none"

MTU="9000"

ONBOOT="yes"

TYPE="Ethernet"

MASTER=bond0

SLAVE=yes

2. All the vNICs that are being bonded will need to be configured as SLAVES as shown above on all the hosts, with bond0 as the master.

Sample ifcfg-bond0 file

DEVICE=bond0

TYPE=Bond

BONDING_MASTER=yes

DEFROUTE=yes

IPV4_FAILURE_FATAL=no

NETMASK=255.255.255.0

GATEWAY=$GATEWAYIP

BONDING_OPTS="mode=6 miimon=100 xmit_hash_policy=0"

BOOTPROTO="none"

ONBOOT="yes"

MTU="9000"

IPADDR=”$HOSTIP”

Appendix B – A Sample Application with HDF Using Live Twitter Stream with Apache Solr and Banana

This section explains how to create a sample application which continuously feeds Twitter live -streams through Hortonworks DataFlow, and sends this to Apache Solr for indexing.

To create the sample application, complete the following steps:

· Create a Twitter Application (Twitter Credentials)

· Install Apache Solr and Banana dashboard

· Configure HDF to pull data from Twitter and store in Solr

Create a Twitter Application

To skip having to register a personal Twitter application, and use previous data, go to the next section to download the sample dataset.

To pull live data from Twitter in this tutorial, register a personal Twitter application.

1. Go to the Twitter Apps Website and sign in using a personal Twitter account

2. Click Create a New App.

Description: Creating Twitter App

3. Fill in details about the application.

Description: Twitter App Details

4. Click Create Your Twitter Application at the bottom of the screen after reading the developer agreement.

Note: Add your mobile phone to a personal Twitter account before creating the application

Shows the dashboard for the Twitter application.

5. Click the permissions tab and select the Read Only Option and Update the application.

Description: Changing App Permission

6. Generate the OAuth key.

7. Click Test OAuth on the top of the permissions page, or go to Keys and Access Tokens and find the option to generate the OAuth tokens.

The keys and access tokens should look similar to the following:

Description: Twitter Tokens

Make note of the Consumer Key, Consumer Secret, Access Token, and Access Token Secret. These are used to create the data flow in NiFi.

Solr Installation

1. To enable search, install the hdpsearch package as show below on the existing Hadoop cluster nodes.

clush –a –b yum install -y lucidworks-hdpsearch

2. From the internet connected node, download the default.json files and copy them to the admin node of the Hadoop cluster.

mkdir –p /tmp/solr

cd /tmp/solr

wget https://raw.githubusercontent.com/abajwa-hw/ambari-nifi-service/master/demofiles/default.json

3. Connect to rhel1 (Hadoop admin node) and perform the following steps to copy the default.json file from root to the banana dashboard.

[root@rhel1 ~]# cp default.json /opt/lucidworks-hdpsearch/solr/server/solr-webapp/webapp/banana/app/dashboards/

cp: overwrite ‘/opt/lucidworks-hdpsearch/solr/server/solr-webapp/webapp/banana/app/dashboards/default.json’? y

4. Edit the Solr configuration file to add the date format compatible with twitter data.

vi /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml

5. Search for ParseDateFieldUpdateProcessorFactory and the add the following line at the beginning:

6. Backup the solrconfig.xml file to root on all the nodes.

[root@rhel1 ~]# clush -a -b cp /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml /root/solrconfig.xml-bkp

7. Copy the solrconfig.xml file from rhel1 to the all nodes.

[root@rhel1 ~]# clush -a -b -c /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml

8. Add zookeeper node details to the solr configuration with the following commands

cd /opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts

9. Edit this line with lists of the nodes which zookeeper is running on.

/zkcli.sh -zkhost rhel1,rhel2,rhel3 -cmd makepath /solr

10. Start Solr on all the nodes.

clush -a -b /opt/lucidworks-hdpsearch/solr/bin/solr start -c -z rhel1,rhel2,rhel3:2181/solr

11. Create Solr collection with 2 shards and replication factor 2.

/opt/lucidworks-hdpsearch/solr/bin/solr create -c tweets -d data_driven_schema_configs -s 2 -rf 2

$Description: C:\Users\matrived\AppData\Local\Microsoft\Windows\Temporary Internet Files\Content.Word\Snap 2016-07-20 at 17.16.33.png$

Configuring Apache Nifi Project with Solr

1. From your Hadoop cluster, copy core-site.xml and put the HDF cluster on all the nodes under /opt/configuration-resources.

clush -a -b -c /root/core-site.xml –dest=/opt/ configuration-resources/

2. Go to http://10.4.1.240:8080/nifi/ (Distribution Node)

The top left icon corresponds to a “processor” which is the building block used to create NiFi flows on the workspace.

3. Drag/drop the processor icon (top left of page) onto the workspace to display a list of available processors. To find the GetTwitter processor, type “twitter” in the search box (edit screenshot and show some pointer)

Note: To create a twitter application or credentials, please see information in Appendix B.

4. Right click on the GetTwitter Property, and update the twitter endpoint to the filter endpoint and provide the twitter application credentials. Allow terms to filter to the section with the tweet selected.

5. Drag the Process group icon from the left tab (the 5^th from the left) to the workspace.

6. Connect to the cluster node NIFI :

7. Go to the http://10.4.1.241:8080/nifi/ ( HDF cluster IP)

Create a Data Flow with NiFi

1. Click the link below to download the NiFi data flow template for the Twitter Dashboard:

Download

2. Make note of the download location of this file. It is used in the next step.

3. Open up the NiFi user interface found at http://10.4.1.241:8080/nifi/ .

4. Import the downloaded template into NiFi.

5. Import the template by clicking the Templates icon on the top right corner of the screen (Third from the right).

6. Click Browse and navigate to the CVD-Cluster-Twitter-Demo-Template.xml file that was previously downloaded.

7. Click Import.

The template appears as shown below.

Now that the template is imported into NiFi, to instantiate it:

8. Drag the template icon (the 7th from the left) onto the workspace.

Description: Drag Template Icon

9. A dialog box should appear. Make sure that CVD-Cluster-Twitter-Demo-Template is selected and click Add.

10. A screen similar to the following should appear:

The NiFi flow has been set up. The boxes are what NiFi calls processors. Each of the processors can be connected to one another to help make the data flow. Each processor can perform specific tasks and are at the very heart of NiFi’s functionality.

Note: To make the data flows look clean, double click a connection arrow to create a vertex.

Make the connections between the processors 90 degree angles with respect to one another.

Right-click on a few of the processors and look at their configuration. This shows how the Twitter flow works.

Note: Change the Solr location to the Hadoop zookeeper nodes IP addresses.

Twitter Dashboard of the Demo Application

$Description: C:\Users\matrived\AppData\Local\Microsoft\Windows\Temporary Internet Files\Content.Word\Snap 2016-07-20 at 17.16.12.png$

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)