Cisco MDS 9000 Family Fabric Manager Configuration Guide, Release 2.x - Performance Monitoring [Cisco MDS 9000 NX-OS and SAN-OS Software]

Table Of Contents

Performance Monitoring

Real-Time Performance Monitoring

Device Manager Real-Time Performance Monitoring

Fabric Manager Real-Time ISL Statistics

Historical Performance Monitoring

Creating a Flow with Performance Manager

Creating a Collection with Performance Manager

Using Performance Thresholds

Using the Performance Manager Configuration Wizard

Starting and Stopping Data Collection

Viewing Performance Manager Reports

Performance Summary

Performance Tables and Details Graphs

Viewing Performance of Host-Optimized Port Groups

Viewing Performance Manager Events

Generating Top10 Reports in Performance Manager

Generating Top10 Reports Using Scripts

Exporting Data Collections to XML Files

Exporting Data Collections in Readable Format

Configuring Performance Manager for Use with Cisco Traffic Analyzer

Performance Monitoring

Cisco Fabric Manager and Device Manager provide multiple tools for monitoring the performance of the overall fabric, SAN elements, and SAN links. These tools provide real-time statistics as well as historical performance monitoring.

This section contains the following topics:

•Real-Time Performance Monitoring

•Historical Performance Monitoring

Real-Time Performance Monitoring

Real-time performance statistics are a useful tool in dynamic troubleshooting and fault isolation within the fabric. Real-time statistics gather data on parts of the fabric in user-configurable intervals and display these results in Fabric Manager and Device Manager.

Device Manager Real-Time Performance Monitoring

Device Manager provides an easy tool for monitoring ports on the Cisco MDS 9000 Family switches. This tool gathers statistics at a configurable interval and displays the results in tables or charts. These statistics show the performance of the selected port in real-time and can be used for performance monitoring and troubleshooting. For a selected port, you can monitor any of a number of statistics including traffic in and out, errors, class 2 traffic, and FICON data. You can set the polling interval from ten seconds to one hour, and display the results based on a number of selectable options including absolute value, value per second, and minimum or maximum value per second.

Device Manager in Cisco MDS SAN-OS Release 2.1(1) or later supports checking for oversubscription on the host-optimized four-port groups on relevant modules. Right-click the port group on a module and choose Check Oversubscription from the pop-up menu.

Device manager provides two performance views, the Summary View tab, and the configurable monitor option per port.

To configure the summary view in Device Manager, follow these steps:

Step 1 Click the Summary tab on the main display. You see all of the active ports on the switch, as well as the configuration options available from the Summary view.

Step 2 Choose the Poll Interval and how the data should be interpreted by clicking the Show Rx/Tx drop-down menu.The table updates each polling interval to show an overview of the receive and transmit data for each active port on the switch.

Step 3 Choose Show Rx/Tx > %Util/sec, and then choose the warning and critical threshold levels for event reporting. You can also display the percent utilization for a port by selecting the port and clicking the Monitor Selected Interface Traffic Util % icon.

Step 4 Click the Monitor Selected Interface Traffic Details icon if you want more detailed statistics on the port.

The configurable monitor option per port gives statistics for in and out traffic on that port, errors, class 2 traffic and other data that can be graphed over a period of time to give a real-time view into the performance of the port.

To configure per port monitoring in Device Manager, follow these steps:

Step 1 Right-click the port you are interested in and choose Monitor... from the options pop-up menu. You see the port real-time monitor dialog box.

Step 2 Choose the Poll Interval and how the data should be interpreted using the drop-down menu.The table updates each polling interval to show statistics for the selected port.

Step 3 Click a statistics value from the table and then click one of the graphing icons to display a running graph of that statistic over time. You see a graph window that contains options to change the graph type.

Tip You can open multiple graphs for statistics on any of the active ports on the switch.

Fabric Manager Real-Time ISL Statistics

You can configure Fabric Manager to gather ISL statistics in real time. These ISL statistics include receive and transmit utilization, bytes per second, as well as errors and discards per ISL.

To configure ISL statistics in Fabric Manager, follow these steps:

Step 1 Select Performance > ISL in Real-Time. ISL statistics display in the Information pane.

Step 2 Choose the Poll Interval and bandwidth utilization thresholds. The table updates each polling interval to show statistics for all configured ISLs in the fabric.

Step 3 Select a row in the table to highlight that ISL in blue on the Topology map.

Historical Performance Monitoring

Performance Manager gathers network device statistics historically and provides this information graphically using a web browser. It presents recent statistics in detail and older statistics in summary. Performance Manager also integrates with external tools such as Cisco Traffic Analyzer.

The Performance Manager has three operational stages:

•Definition—Uses two configuration wizards to create a collection configuration file.

•Collection—Reads the configuration file and collects the desired information.

•Presentation—Generates web pages to present the collected data.

See the "Performance Manager Architecture" section on page 6-1 for an overview of Performance Manager.

Creating a Flow with Performance Manager

Performance Manager has a Flow Configuration Wizard that steps you through the process of creating host-to-storage, storage-to-host, or bidirectional flows. Table 33-1 explains the Flow Type radio button that defines the type of traffic monitored.

Table 33-1 Performance Manager Flow Types

Flow type

Description

Host->Storage

Unidirectional flow, monitoring data from the host to the storage element

Storage->Host

Unidirectional flow, monitoring data from the storage element to the host

Both

Bidirectional flow, monitoring data to and from the host and storage elements.

Once defined, these flows can be added to a collection configuration file to monitor the traffic between a host/storage element pair.

To create a flow in Fabric Manager, follow these steps:

Step 1 Choose Performance > Create Flows to launch the wizard.

Step 2 Choose the VSAN from which you want to create flows. Flows are defined per VSAN.

Step 3 Click the Type radio button for the flow type you want to define.

Step 4 Check the Clear old flows on modified switches check box if you want to remove old flow data.

Step 5 Click Next to review the available flows for the chosen VSAN. Remove any flows you are not interested in.

Step 6 Click Finish to create the flow.

The flows created become part of the collection options in the Performance Manager Configuration Wizard.

Creating a Collection with Performance Manager

The Performance Manager Configuration Wizard steps you through the process of creating collections using configuration files. Collections are defined for one or all VSANs in the fabric. Collections can include statistics from the SAN element types described in Table 33-2.

Table 33-2 Performance Manager Collection Types

Collection Type

Description

ISLs

Collects link statistics for ISLs.

Host

Collects link statistics for SAN hosts.

Storage

Collects link statistics for a storage elements.

Flows

Collects flow statistics defined by the Flow Configuration Wizard.

Using Performance Thresholds

The Performance Manager Configuration Wizard allows you to set up two thresholds that will trigger events when the monitored traffic exceeds the percent utilization configured. These event triggers can be set as either Critical or Warning events that are reported on the Fabric Manager web client Events browser page.

You must choose either absolute value thresholds or baseline thresholds that apply to all transmit or receive traffic defined in the collection. Click the Use absolute values radio button on the last screen of the Performance Manager Configuration Wizard to configure thresholds that apply directly to the statistics gathered. These statistics, as a percent of the total link capacity, are compared to the percent utilization configured for the threshold type. If the statistics exceed either configured threshold, an event is shown on the Fabric Manager web client Events tab.

As an example, the collection has absolute value thresholds set for 60% utilization (for warning) and 80% utilization (for critical). If Performance Manager detects that the traffic on a 1-Gigabit link in its collection exceeds 600 Mbps, a warning event is triggered. If the traffic exceeds 800 Mbps, a critical event is triggered.

Baseline thresholds are defined for a configured time of day or week (1 day, 1 week, or 2 weeks). The baseline is created by calculating the average of the statistical results for the configured time each day, week, or every 2 weeks. Table 33-3 shows an example of the statistics used to create the baseline value for a collection defined at 4 pm on a Wednesday.

Table 33-3 Baseline Time Periods for a Collection Started on Wednesday at 4pm

Baseline Time Window

Statistics Used in Average Calculation

1 day

Every prior day at 4 pm

1 week

Every prior Wednesday at 4 pm

2 weeks

Every other prior Wednesday at 4 pm

Baseline thresholds create a threshold that adapts to the typical traffic pattern for each link for the same time window each day, week, or every 2 weeks. Baseline thresholds are set as a percent of the average (110% to 500%), where 100% equals the calculated average.

As an example, a collection is created at 4 pm on Wednesday, with baseline thresholds set for 1 week, at 150% of the average (warning) and 200% of the average (critical). Performance Manager recalculates the average for each link at 4 pm every Wednesday by taking the statistics gathered at that time each Wednesday since the collection started. Using this as the new average, Performance Manager compares each received traffic statistic against this value and sends a warning or critical event if the traffic on a link exceeds this average by 150% or 200% respectively.

Table 33-4 shows two examples of 1-Gigabit links with different averages in our example collection and at what traffic measurements the Warning and Critical events are sent.

Table 33-4 Example of Events Generated for 1-Gigabit Links

Average

Warning Event Sent at 150%

Critical Event Sent at 200%

400 Mbps

600 Mbps

800 Mbps

200 Mbps

300 Mbps

400 Mbps

Set these thresholds on the last screen of the Collections Configuration Wizard by checking the Send events if traffic exceeds threshold check box.

Using the Performance Manager Configuration Wizard

To create a collection using the Performance Manager Configuration Wizard in Fabric Manager, follow these steps:

Step 1 Choose Performance > Create Collection to launch the Performance Manager Configuration Wizard.

Step 2 Choose the VSANs from which you want to collect data or choose All to collect statistics across all VSANs in the fabric.

Step 3 Check the Type check boxes for each type of links flow or SAN element that you want included in your collection.

Step 4 If you want to ignore flows with zero counter values, check that check box.

Step 5 If you are using Cisco Traffic Analyzer, enter the URL where it is located on your network.

Step 6 Click Next to review the collection specification data. Remove any links, flows, or SAN elements you are not interested in.

Step 7 Click Next to configure other collection options.

Step 8 Check the appropriate check boxes if you want to include errors and discards in your collection, and if you want to interpolate data for missing statistics.

Step 9 Check the Send event if traffic exceeds threshold check box if you want to configure threshold events as explained in the "Using Performance Thresholds" section.

Step 10 Click the Use absolute values radio button if you want absolute value thresholds or click the Baseline values over radio button if you want baseline thresholds.

Step 11 Choose the time window for baseline calculations if baseline thresholds are configured.

Step 12 Choose the Critical and Warning threshold values as a percent of link capacity (for absolute value thresholds ) or average (for baseline thresholds).

Step 13 Click Finish to create the collection configuration file. You see a dialog box asking if you want to restart Performance Manager.

Step 14 Click Yes to restart Performance Manager to use this new configuration file, or click No to exit the Performance Manager Configuration Wizard without restarting Performance Manager. If you choose No, Performance Manager will not use the new configuration file until you restart it by choosing Performance Manager > Collector > Restart.

Note If you reconfigure your fabric, you may need to update your Performance Manager collections and flows. Recreate your flows and collections using Performance Manager Configuration Wizard.

Starting and Stopping Data Collection

After configuring the collection or iSCSI flows, you can start or restart the collection by choosing Performance > Collector > Start or Performance > Collector > Restart. You can verify that the collection has started by checking the PMCollector.log file in the main MDS9000 directory on your Fabric Manager Server, or by viewing the status of the collection on the Fabric Manager web client Admin tab.

You can manually stop a data collection process in Windows using the services panel. Right-click the Cisco Performance Manager service and choose Stop.

On a UNIX machine, enter the following command:
$HOME/.ciscomds9000/bin/pm.sh stop
You can also start, restart, or stop the collection using the Fabric Manager web client Admin tab.

Viewing Performance Manager Reports

You can view Performance Manager statistical data using preconfigured reports that are built on demand and displayed in a web browser. These reports provide summary information as well as detailed statistics that can be viewed for daily, weekly, monthly, or yearly results.

Choose Performance > Reports to access Performance Manager reports from Fabric Manager. This opens a web browser window showing the default Fabric Manager web client event summary report. Click the Performance tab to view the Performance Manager reports. Performance Manager begins reporting data ten minutes after the collection is started.

Performance Summary

The Performance Summary page presents a dashboard display of the throughput and link utilization for hosts, ISLs, storage, and flows for the last 24-hour period. The summary provides a quick overview of the fabric's bandwidth consumption and highlights any hotspots.

The report includes network throughput pie charts and link utilization pie charts. Use the navigation tree on the left to show summary reports for monitored fabrics or VSANs. The summary displays charts for all hosts, storage elements, ISLs, and flows. Each pie chart shows the percent of entities (links, hosts, storage, ISLs, or flows) that measure throughput or link utilization on each of six predefined ranges. Move the mouse over a pie chart section to see how many entities exhibit that range of statistics. Double-click any pie chart to bring up a table of statistics for those hosts, storage elements, ISLs, or flows.

Performance Tables and Details Graphs

Click Host, Storage, ISL, or Flow to view traffic over the past day for all hosts, storage, ISLs, or flows respectively. A table lists all of the selected entities, showing transmit and receive traffic and errors and discards, if appropriate. The table can be sorted by any column heading. The table can also be filtered by day, week, month, or year. Tables for each category of statistics display average and peak throughput values and provide hot-links to more detailed information.

Clicking a link in any of the tables opens a details page that shows graphs for traffic by day, week, month, and year. If flows exist for that port, you can see which storage ports sent data. The details page also displays graphs for errors and discards if they are part of the statistics gathered and are not zero.

If you double-click a graph on a Detail report, it will launch the Cisco Traffic Analyzer for Fibre Channel, if configured. The aliases associated with hosts, storage devices, and VSANs in the fabric are passed to the Cisco Traffic Analyzer to provide consistent, easy identification.

Viewing Performance of Host-Optimized Port Groups

You can monitor the performance of host-optimized port groups by clicking Performance > End Devices and selecting Port Groups from the Type drop-down list.

Viewing Performance Manager Events

Performance Manager events are viewable through the Fabric Manager Web Client.

To view Performance Manager events, follow these steps:

Step 1 Choose Performance Manager > Reports. You see a summary of all fabrics monitored by the Fabric Manager Server in a web browser.

Step 2 Choose a fabric and then click the Events tab to see a summary or detailed report of the events that have occurred in the selected fabric.

Generating Top10 Reports in Performance Manager

Cisco MDS SAN-OS Release 2.1(1a) introduces the ability to generate historical Top10 reports that can be saved for later review. These reports list the entities from the data collection, with the most active entities appearing first. This is a static, one-time only report that generates averages and graphs of the data collection as a snapshot at the time the report is generated.

Tip Name the reports with a timestamp so that you can easily find the report for a given day or week.

These Top10 reports differ from the other monitoring tables and graphs in Performance Manager in that the other data is continuously monitored and is sortable on any table column. The Top10 reports are a snapshot view at the time the report was generated.

Note Top10 reports require analyzing the existing data over an extended period of time and can take hours or more to generate on large fabrics.

To generate a Top10 report using Fabric Manager Web Services, follow these steps:

Step 1 Choose Performance from the main page and click the Reports tab. The list of existing reports displays.

Step 2 Enter a name for a new report and click Generate. The report may take hours to generate. When finished, the report appears by name in the left-hand navigation bar.

Step 3 Click the name of the generated report to see the Top10 tables for your fabric.

Step 4 Click the name of any entity in the Top10 tables to see a series of graphs for the transmit and receive data rates as well as errors and discards.

Generating Top10 Reports Using Scripts

You can generate Top10 reports manually by issuing the following commands:

•On UNIX, run the script:
"/<user_directory>/.cisco_mds9000/bin/pm.sh display pm/pm.xml <output_directory>"
•On Windows, run the script:
"c:\Program Files\Cisco Systems\MDS 9000\bin\pm.bat display pm\pm.xml 
<output_directory>"
On UNIX, you can automate the generation of the Top10 reports on your Fabric Manager Server host by adding the following cron entry to generate the reports once an hour:
0 * * * * /<user_directory>/.cisco_mds9000/bin/pm.sh display pm/pm.xml <output_directory>
If your crontab does not run automatically or Java complains about an exception similar to Example 33-1, you need to add "-Djava.awt.headless=true" to the JVMARGS command in /<user_directory>/.cisco_mds9000/bin/pm.sh.

Example 33-1 Example Java Exception
in thread "main" java.lang.InternalError Can't connect to X11 window server using '0.0' as 
the value of the DISPLAY variable.
Exporting Data Collections to XML Files

The RRD files used by Performance Manager can be exported to a freeware tool called rrdtool. The rrd files are located in pm/db on the Fabric Manager Server. To export the collection to an XML file, enter the following command at the operating system command-line prompt:
/bin/pm.bat xport xxx yyy
In this command, xxx is the RRD file and yyy is the XML file that is generated. This XML file is in a format that rrdtool is capable of reading with the command:
rrdtool restore filename.xml filename.rrd
You can import an XML file with the command:
bin/pm.bat pm restore <xmlFile> <rrdFile>
This reads the XML export format that rrdtool is capable of writing with the command:
rrdtool xport filename.xml filename.rrd.
The pm xport and pm restore commands can be found on your Fabric Manager Server at bin\PM.bat for Windows platforms or bin/PM.sh on UNIX platforms. For more information on the rrdtool, refer to the following website: http://www.rrdtool.org.

Exporting Data Collections in Readable Format

Cisco MDS SAN-OS Release 2.1(1a) introduces the ability to export data collections in comma-separated format (CSV). This format can be imported to various tools, including Microsoft Excel. You can export these readable data collections either from the Fabric Manager Web Services menus or in batch mode from the command line on Windows or UNIX. Using Fabric Manager Web Services, you can export one file. Using batch mode, you can export all collections in the pm.xml file.

To export data collections using Fabric Manager Web Services, follow these steps:

Step 1 Choose Performance from the main page and select the category you want to export from (Hosts, Storage, ISLs, or Flows). You see the overview table.

Step 2 Double-click the Name of the entity you want to export. You see the detailed graph for that entity in a po- up window.

Step 3 Click the Export icon in the lower left of the graph. You see the Open/Save dialog box.

Step 4 Click Save to save the csv file or click Cancel to discard this operation.

To export data collections using command line batch mode, follow these steps:

Step 1 Go to the installation directory on your workstation and then go to the bin directory.

Step 2 On Windows, enter .\pm.bat export C:\Program Files\Cisco Systems\MDS 9000\pm\pm.xml <export directory>. This creates the csv file, export.csv, in the <export directory> on your workstation.

Step 3 On UNIX, enter ./pm.sh export /usr/local/cisco_mds9000/pm/pm.xml <export directory>. This creates the csv file, export.csv, in the <export directory> on your workstation.

When you open this exported file in Microsoft Excel, the following information displays:

•Title of the entity you exported and the address of the switch the information came from.

•The maximum speed seen on the link to or from this entity.

•The VSAN ID and maximum speed.

•The timestamp, followed by the receive and transmit data rates in bytes per second.

Configuring Performance Manager for Use with Cisco Traffic Analyzer

Performance Manager works in conjunction with the Cisco Traffic Analyzer to allow you to monitor and manage the traffic on your fabric. Using Cisco Traffic Analyzer with Performance Manager requires the following components:

•A configured Fibre Channel Switched Port Analyzer (SPAN) destination (SD) port to forward Fibre Channel traffic.

•A Port Analyzer Adapter 2 (PAA-2) to convert the Fibre Channel traffic to Ethernet traffic.

•Cisco Traffic Analyzer software to analyze the traffic from the PAA-2.

To configure Performance Manager to work with the Cisco Traffic Analyzer, follow these steps:

Step 1 Set up the Cisco Traffic Analyzer according to the instructions in the Cisco MDS 9000 Family Port Analyzer Adapter 2 Installation and Configuration Note.

Step 2 Get the following three pieces of information:

•The IP address of the management workstation on which you are running Performance Manager and Cisco Traffic Analyzer.

•The path to the directory where Cisco Traffic Analyzer is installed.

•The port that is used by Cisco Traffic Analyzer (the default is 3000).

Step 3 Start the Cisco Traffic Analyzer.

a. Choose Performance > Traffic Analyzer > Open.

b. Enter the URL for the Cisco Traffic Analyzer, in the format
http://<ip address>:<port number>
where:

<ip address> is the address of the management workstation on which you have installed the Cisco Traffic Analyzer, and

:<port number> is the port that is used by Cisco Traffic Analyzer (the default is :3000).

c. Click OK.

d. Choose Performance > Traffic Analyzer > Start.

e. Enter the location of the Cisco Traffic Analyzer, in the format
D:\<directory>\ntop.bat
where:

D: is the drive letter for the disk drive where the Cisco Traffic Analyzer is installed, and

<directory> is the directory containing the ntop.bat file.

f. Click OK.

Step 4 Create the flows you want Performance Manager to monitor, using the Flow Configuration Wizard.

Step 5 Define the data collection you want Performance Manager to gather, using the Performance Manager Configuration Wizard.

a. Choose the VSAN you want to collect information for or choose All VSANs.

b. Check the types of items you want to collect information for (hosts, ISLs, storage devices, and flows).

c. Enter the URL for the Cisco Traffic Analyzer in the format
http://<ip address>/<directory>
where:

<ip address> is the address of the management workstation on which you have installed the Cisco Traffic Analyzer, and <directory> is the path to the directory where the Cisco Traffic Analyzer is installed.

d. Click Next.

e. Review the data collection on this and the next section to make sure this is the data you want to collect.

f. Click Finish to begin collecting data.

Note Data is not collected for JBOD or for virtual ports. If you change the data collection configuration parameters during a data collection, you must stop and restart the collection process for your changes to take effect.

Step 6 Choose Performance > Reports to generate a report.

Note It takes at least five minutes to start collecting data for a report. Do not attempt to generate a report in Performance Manager during the first five minutes of collection.

Step 7 Click the Cisco Traffic Analyzer at the top of the Host or Storage detail pages to view the Cisco Traffic Analyzer information, or choose Performance > Traffic Analyzer > Open. The Cisco Traffic Analyzer page will not open unless ntop has been started already.

Note For information on capturing a SPAN session and starting a Cisco Traffic Analyzer session to view it, refer to the Cisco MDS 9000 Family Port Analyzer Adapter 2 Installation and Configuration Note.

Note For information on viewing and interpreting your Performance Manager data, see the "Historical Performance Monitoring" section.

For information on viewing and interpreting your Cisco Traffic Analyzer data, refer to the Cisco MDS 9000 Family Port Analyzer Adapter 2 Installation and Configuration Note.

For performance drill-down, Fabric Manager Server can launch the Cisco Traffic Analyzer in-context from the Performance Manager graphs. The aliases associated with hosts, storage devices, and VSANs are passed to the Cisco Traffic Analyzer to provide consistent, easy identification.