Overview
CPS system and application statistics and Key Performance Indicators (KPI) are collected by the system and can be displayed using a browser-based graphical metrics tool. This chapter provides a high-level overview of the tools CPS uses to collect and display these statistics.
The list of statistics available in CPS is consolidated in an Excel spreadsheet. After CPS is installed, this spreadsheet can be found in the following location on the Cluster Manager VM:
/var/qps/install/current/scripts/documents/QPS_statistics.xlsx
Prometheus
Prometheus is an application which is a part of monitoring solution in CPS. It is used to actively gather statistics from the running virtual machines and application services.
Prometheus application resides on both pcrfclient VMs. It scrapes statistics from collectd exporter after every configured interval and stores in /var/data/Prometheus directory on pcrfclient VMs.
To learn more about Prometheus, refer to: https://prometheus.io/docs/introduction/overview/.
Enable Prometheus
This following sections provides information on how to enable Prometheus on CPS system.
-
By default, Prometheus is disabled on system. You need to configure Prometheus to start its operation.
-
You can configure Prometheus using CSV based configurations or API based configurations.
-
By default, statistics granularity is set to 10 seconds. To change it, you need to configure statistics granularity. Support is present for both CSV/API based installations.
Note
It is recommended to keep the default statistics granularity. If you want to change the value, contact your Cisco Technical representative.
-
After enabling Prometheus, you must add Prometheus data source in Grafana.
-
When Prometheus is enabled on the system, existing dashboards created with graphite will not work. You must use Prometheus queries to create new dashboard on the system.
CSV Based Installation Configuration Parameters
Parameter |
Description |
---|---|
enable_prometheus |
This parameter is used to enable/disable Prometheus in CPS. Default: disabled Possible Values: enabled, disabled |
stats_granularity |
This parameter is used to configure statistics granularity in seconds. Default: 10 seconds Possible Values: Positive Number |
For example, in case of CSV based installations, you can configure Configuration.csv with the following parameters to enable Prometheus on Cluster Manager:
cat /var/qps/config/deploy/csv/Configuration.csv | tail -5
db_authentication_admin_passwd,72261348A44594381D2E84ADDD1E6D9A,
db_authentication_passwd_encryption,true,
db_authentication_readonly_passwd,72261348A44594381D2E84ADDD1E6D9A,
enable_prometheus,enabled,
stats_granularity,10,
After configuring the parameters, run the following commands to import the new configuration to VMs:
/var/qps/install/current/scripts/import/import_deploy.sh
/var/qps/install/current/scripts/upgrade/reinit.sh
API Based Installation Parameters
Parameter |
Description |
---|---|
enablePrometheus |
This parameter is used to enable/disable Prometheus in CPS. Default: disabled Possible Values: enabled, disabled |
statsGranularity |
This parameter is used to configure statistics granularity in seconds. Default: 10 seconds Possible Values: Positive Number |
In case of API based
installations, you need to use
api/system/config/config
PATCH API from Cluster
Manager.
For example:
cat prom.yaml
enablePrometheus: "enabled"
statsGranularity: "10"
curl -i -X PATCH http://installer:8458/api/system/config/config -H "Content-Type: application/yaml" --data-binary @prom.yaml
HTTP/1.1 200 OK
Date: Fri, 20 Apr 2018 08:38:20 GMT
Content-Length: 0
Add Datasource in Grafana for Prometheus
Procedure
Step 1 |
Login to Grafana with admin credentials. |
||
Step 2 |
Click on the Grafana logo to open the sidebar menu. |
||
Step 3 |
Click on Data Sources in the sidebar. |
||
Step 4 |
Click on Add data source. |
||
Step 5 |
From Type drop-down list, select Prometheus. |
||
Step 6 |
Set the appropriate Prometheus server URL (for example, http://localhost:9090/). |
||
Step 7 |
Click Add to save the new data source. |
||
Step 8 |
Create graph with Prometheus as a data source. For example, sample graph which gives 1 min load average of VMs.
|
Graphite
Collected clients running on all CPS Virtual Machines (such as Policy Server (QNS), Policy Director (LB), and sessionmgr) push data to the Collected master on the pcrfclient01. The Collected master node in turn forwards the collected data to the Graphite database on the pcrfclient01.
The Graphite database stores system-related statistics such as CPU usage, memory usage, and Ethernet interface statistics, as well as application message counters such as Gx, Gy, and Sp.
Pcrfclient01 and pcrfclient02 collect and store these bulk statistics independently.
As a best practice, always use the bulk statistics collected from pcrfclient01. Pcrfclient02 can be used as a backup if pcrfclient01 fails.
In the event that pcrfclient01 becomes unavailable, statistics will still be gathered on pcrfclient02. Statistics data is not synchronized between pcrfclient01 and pcrfclient02, so a gap would exist in the collected statistics while pcrfclient01 is down.
Note |
It is normal to have slight differences between the data on pcrfclient01 and pcrfclient02. For example, pcrfclient01 generates a file at time t and pcrfclient02 generates a file at time t +/- clock drift between the two machines. |
Note |
Based on the retention period configured in /etc/carbon/storage-schemas.conf in pcrfclient VMs, same time period graph may look different after some time. Default retention period is 10s:1d,60s:60d (10 seconds data points for 1 day and 60 seconds data points for 60 days). As per the default retention, first day 6 data points are available for each minute and after that, those 6 data points are aggregated and only one data point is available for a minute in Graphite. |
Additional Graphite Documentation
To learn more about Graphite, refer to: http://graphite.readthedocs.org/en/latest/
For a list of all functions that can be used to transform, combine and perform computations on data stored in Graphite, refer to: http://graphite.readthedocs.org/en/latest/functions.html