Introduction
This document describes the procedure to reduce Prometheus disk space usage in a Cisco Policy Suite (CPS) - Diameter Routing Agent (DRA).
Prerequisites
Requirements
Cisco recommends that you have knowledge of these topics:
Note: Cisco recommends that you must have privilege admin and cps user access to CPS-DRA CLI.
Components Used
The information in this document is based on these software and hardware versions:
- CPS-DRA 21.1.
- Unified Computing System (UCS)-B
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Background Information
CPS uses Prometheus and Grafana to monitor Key Performance Indicators (KPI) and performance.
Prometheus is a free software application used to monitor events and to alert. It records real-time metrics in a time series database (which allows high dimensionality) built by the use of a Hypertext Transfer Protocol (HTTP) pull model, with flexible queries and real-time alerts. Prometheus is not intended as a dashboard solution. This needs to be hooked up with Grafana to generate dashboards.
Simply, Prometheus is a solution to monitor that stores time-series data like metrics. Grafana allows to visualize the data stored in Prometheus.
Problem
There can be a situation of high Prometheus disk space usage due to a surge in traffic or some other reason. Whenever Prometheus disk usage goes beyond 70% there must be an alert triggered and if the usage touches 100%, grafana stops the display of data in dashboards.
The disk partition that stores Prometheus data is /stats.
Filesystem Size Used Avail Use% Mounted on
udev 32G 0 32G 0% /dev
tmpfs 6.3G 3.1M 6.3G 1% /run
/dev/sda3 97G 13G 81G 14% /
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sda1 180M 55M 113M 33% /boot
/dev/sdb2 128G 82G 41G 68% /stats
/dev/sdb1 69G 16G 50G 25% /data
Procedure to Reduce Prometheus Disk Space Usage
Method 1. Temporary workaround to modify the retention time gracefully which can automatically remove the oldest data in accordance with the time interval given.
Here is an example, when you modify the retention time from 8760 Hr to 1 Hr.
Step 1. Run this command from DRA Primary Orchestrator to stop supervisorctl in all prometheus-planning containers.
admin@orchestrator[pn-master-0]# docker exec prometheus-planning- "supervisorctl stop all"
==========output from container prometheus-planning-s101===========
haproxy: stopped
prometheus-old: stopped
prometheus: stopped
consul: stopped
==========output from container prometheus-planning-s102===========
haproxy: stopped
prometheus: stopped
prometheus-old: stopped
consul: stopped
==========output from container prometheus-planning-s103===========
haproxy: stopped
prometheus: stopped
prometheus-old: stopped
consul: stopped
Step 2. Run these commands from DRA Primary Orchestrator to modify the retention period gracefully.
admin@orchestrator[pn-master-0]# docker connect prometheus-planning-s101
root@prometheus-planning-s101:/# sudo sed -i 's/8760h/1h/g' /etc/supervisor/conf.d/supervisord.conf
root@prometheus-planning-s101:/# supervisorctl update all
prometheus: stopped
prometheus: updated process group
prometheus-old: stopped
prometheus-old: updated process group
root@prometheus-planning-s101:/# exit
exit
Similarly, repeat for prometheus-planning-s102 and prometheus-planning-s103.
Step 3. Run this command from DRA Primary Orchestrator to start supervisorctl in all prometheus-planning containers.
admin@orchestrator[pn-master-0]# docker exec prometheus-planning- "supervisorctl restart all"
==========output from container prometheus-planning-s101===========
haproxy: started
consul: started
prometheus: started
prometheus-old: started
==========output from container prometheus-planning-s102===========
haproxy: started
consul: started
prometheus: started
prometheus-old: started
==========output from container prometheus-planning-s103===========
haproxy: started
consul: started
prometheus: started
prometheus-old: started
Step 4. Verify the healthchecks from DRA Primary Orchestrator.
#show system status
#show system diagnostics | tab | exclude pass
#docker logs prometheus-planning-s101, docker logs prometheus-planning-s102, docker logs prometheus-planning-s103
Method 2. Workaround to remove the data gracefully from /stats.
Step 1. Run this command from DRA Primary Orchestrator to stop supervisorctl in all prometheus-planning containers.
admin@orchestrator[pn-master-0]# docker exec prometheus-planning- "supervisorctl stop all"
==========output from container prometheus-planning-s101===========
haproxy: stopped
prometheus: stopped
prometheus-old: stopped
consul: stopped
==========output from container prometheus-planning-s102===========
haproxy: stopped
prometheus: stopped
prometheus-old: stopped
consul: stopped
==========output from container prometheus-planning-s103===========
haproxy: stopped
prometheus: stopped
prometheus-old: stopped
consul: stopped
admin@orchestrator[pn-master-0]#
Step 2. Remove the required old data (remove the oldest data) from Primary, control-0, control-1.
Here, you can remove the data till July-2021 from Primary, control-0, and control-1:
Log in to the Primary node as a cps user and proceed to the directory /stats/prometheus-planning/2.0
Primary:
cps@pn-master-0:/stats/prometheus-planning/2.0$
drwxr-xr-x 3 root root 4096 Jul 4 2021 01F9RJSYA74WMD4PR1GW4PD16S
drwxr-xr-x 3 root root 4096 Jul 24 2021 01FBCGFSX6WH8QGRDD1EFQFJTV
drwxr-xr-x 3 root root 4096 Aug 13 2021 01FD0N12YBF3VB5F37XBVW2MM5
drwxr-xr-x 3 root root 4096 Sep 3 2021 01FEMSJSEKS8ZQQSCHPSSXNBQD
drwxr-xr-x 3 root root 4096 Sep 23 09:00 01FG8Y466EPGF3BBS1W3JXQA30
drwxr-xr-x 3 root root 4096 Oct 13 15:00 01FHX2P112B5DCQYKNTNB7GB2F
drwxr-xr-x 3 root root 4096 Nov 2 21:00 01FKH77D8TMDW3XV8MQMB0641N
drwxr-xr-x 3 root root 4096 Nov 23 03:00 01FN5BRXQ5ERW24CEK7GRFJXZW
drwxr-xr-x 3 root root 4096 Dec 13 09:00 01FPSGAEST7K8K96KT616T3FTX
drwxr-xr-x 3 root root 4096 Jan 2 15:00 01FRDMW8Z9J48QM3VQ8AG2PJ63
drwxr-xr-x 3 root root 4096 Jan 22 21:00 01FT1SDKXW2E4KWJ7WFB4R80YP
drwxr-xr-x 3 root root 4096 Feb 14 05:08 01FVVA4C33QPC9P0WTH3VYXBWW
drwxr-xr-x 3 root root 4096 Mar 4 09:00 01FXA2GP1GAHHMY6GAW2SPHZMV
drwxr-xr-x 3 root root 4096 Mar 11 03:00 01FXVEPF1VK6RBKXZ3T8MX5PPY
drwxr-xr-x 3 root root 4096 Mar 13 09:00 01FY1832D9C31CYQKTZV2W17CZ
drwxr-xr-x 3 root root 4096 Mar 14 03:00 01FY35WKHJAC5AH4EZH9YWFK5B
drwxr-xr-x 3 root root 4096 Mar 14 21:00 01FY53P4QAQEDDE4DSFHRPSCM2
drwxr-xr-x 3 root root 4096 Mar 15 03:00 01FY5R9A5PA7H0M30HC44RVQBH
drwxr-xr-x 3 root root 4096 Mar 15 03:00 01FY5R9AAQBEH9PMEXF7M508XK
drwxr-xr-x 3 root root 4096 Mar 15 05:00 01FY5Z51DNMGHPRFSQ20G43ETW
drwxr-xr-x 23 root root 4096 Mar 15 05:00 .
drwxr-xr-x 2 root root 4096 Mar 15 05:00 wal
Run this command to remove the required data.
cps@pn-master-0:/stats/prometheus-planning/2.0$ sudo rm -rf 01FBCGFSX6WH8QGRDD1EFQFJTV 01F9RJSYA74WMD4PR1GW4PD16S
Log in to the Control-0 node as a cps user and proceed to the directory /stats/prometheus-planning/2.0
CONTROL-0:
cps@pn-control-0:/stats/prometheus-planning/2.0$ ls -lrt
drwxr-xr-x 3 root root 4.0K Apr 14 2021 01F380KG1TEDJ0A4XRNDA8K8M3
drwxr-xr-x 3 root root 4.0K May 4 2021 01F4VY9BP3RMK3J2BPXSP2VQA5
drwxr-xr-x 3 root root 4.0K May 24 2021 01F6G2TTJHR1XAQBC095WN9E30
drwxr-xr-x 3 root root 4.0K Jun 14 2021 01F847CDQRHPC6SCPC659ZEFH8
drwxr-xr-x 3 root root 4.0K Jul 4 2021 01F9RBYVJSBRKAC5VFGPT9JVS1
drwxr-xr-x 3 root root 4.0K Jul 24 2021 01FBCGGD141BRR4D59Y3G9855D
drwxr-xr-x 3 root root 4.0K Aug 13 2021 01FD0N17Z3R2R2MJSG6HE1YNAT
drwxr-xr-x 3 root root 4.0K Sep 3 2021 01FEMSJZFJPCFJNENFV3JHCHCY
drwxr-xr-x 3 root root 4.0K Sep 23 09:00 01FG8Y4A0HCNVVCVNJJNGXT84Y
drwxr-xr-x 3 root root 4.0K Oct 13 15:00 01FHX2P1F8EZVC8EDQVXH4C3VW
drwxr-xr-x 3 root root 4.0K Nov 2 21:00 01FKH77FAJ0DS4P8J8QBQKTC75
drwxr-xr-x 3 root root 4.0K Nov 23 03:00 01FN5BRV1XR7NZ0SFT63M9BR4F
drwxr-xr-x 3 root root 4.0K Dec 13 09:00 01FPSGAE6ZYXQNYZYS56SFNGFX
drwxr-xr-x 3 root root 4.0K Jan 2 15:00 01FRDMW8K61Y2AK96BR5N2C1YK
drwxr-xr-x 3 root root 4.0K Jan 22 21:00 01FT1SDKW7X7HYPEKR5P76NWNB
drwxr-xr-x 3 root root 4.0K Feb 14 05:14 01FVVAEZPBT17H5XCDFC18YPQK
drwxr-xr-x 3 root root 4.0K Mar 4 09:00 01FXA2GNCMTB54N66W18NPJDBR
drwxr-xr-x 3 root root 4.0K Mar 11 03:00 01FXVEPEY54S4YEGJBPC42K01C
drwxr-xr-x 3 root root 4.0K Mar 13 09:00 01FY18329GPWH49HX61GNSDYT6
drwxr-xr-x 3 root root 4.0K Mar 14 03:00 01FY35WKDY9AZRW7DEKYS0T382
drwxr-xr-x 3 root root 4.0K Mar 14 21:00 01FY53P4P6FS9TGA1P1V1GJEQK
drwxr-xr-x 3 root root 4.0K Mar 15 03:00 01FY5R9A7KMN03DHNTW4ME1DA7
drwxr-xr-x 3 root root 4.0K Mar 15 03:00 01FY5R9ABDNV3D6P7S63S9JKF2
drwxr-xr-x 3 root root 4.0K Mar 15 05:00 01FY5Z51FQHV4GJ24D6S1G6QW9
drwxr-xr-x 27 root root 4.0K Mar 15 05:00 .
drwxr-xr-x 2 root root 4.0K Mar 15 05:00 wal
Run this command to remove the required data.
cps@pn-control-0:/stats/prometheus-planning/2.0$ sudo rm -rf 01F380KG1TEDJ0A4XRNDA8K8M3 01F4VY9BP3RMK3J2BPXSP2VQA5 01F6G2TTJHR1XAQBC095WN9E30 01F847CDQRHPC6SCPC659ZEFH8 01F9RBYVJSBRKAC5VFGPT9JVS1 01FBCGGD141BRR4D59Y3G9855D
Log in to the Control-1 node as a cps user and proceed to the directory /stats/prometheus-planning/2.0.
CONTROL-1:
cps@pn-control-1:/stats/prometheus-planning/2.0$ ls -lart
total 108
drwxr-xr-x 3 root root 4096 Apr 14 2021 01F380KRDADD3MMVD0VZXNZWS1
drwxr-xr-x 3 root root 4096 May 4 2021 01F4VY9MBSJ68NW8V5JR4908G0
drwxr-xr-x 3 root root 4096 May 24 2021 01F6G2V1A5Z97G6TV5G9R8MPXR
drwxr-xr-x 3 root root 4096 Jun 14 2021 01F847CE2CV31EDHFJ594FE739
drwxr-xr-x 3 root root 4096 Jul 4 2021 01F9RBY6RASP2T9G2TQJSAHKK8
drwxr-xr-x 3 root root 4096 Jul 24 2021 01FBCGFRD30XYPKRRSWA4HNPE9
drwxr-xr-x 3 root root 4096 Aug 13 2021 01FD0N14GD5B5GQJ6NVBJB6E80
drwxr-xr-x 3 root root 4096 Sep 3 2021 01FEMSJVNJ1N71YPEVB41K6AFJ
drwxr-xr-x 3 root root 4096 Sep 23 09:00 01FG8Y4FJX9K27YJX4KF7FDJAF
drwxr-xr-x 3 root root 4096 Oct 13 15:00 01FHX2P2PWCZ9B665G83WG3ZNE
drwxr-xr-x 3 root root 4096 Nov 2 21:00 01FKH77M15JFZS6TAVMRP82H8T
drwxr-xr-x 3 root root 4096 Nov 23 03:00 01FN5BS5ZE2WFMJF5PE7J41GM7
drwxr-xr-x 3 root root 4096 Dec 13 09:00 01FPSGACKH8EW951GJ04RSGACD
drwxr-xr-x 3 root root 4096 Jan 2 15:00 01FRDMW9W0110PS4NTGZ7VETV7
drwxr-xr-x 3 root root 4096 Jan 22 21:00 01FT1SDKV8X6103294D4A5KDBY
drwxr-xr-x 3 root root 4096 Feb 14 05:14 01FVVAF1FN79BZNF25665987AV
drwxr-xr-x 3 root root 4096 Mar 4 09:00 01FXA2GWSKN955BW5FJ71C8D0A
drwxr-xr-x 3 root root 4096 Mar 11 03:00 01FXVEPGAHGS4JA55VF6RZTJWX
drwxr-xr-x 3 root root 4096 Mar 13 09:00 01FY1833FXKQGEAF2MNXWTW6J7
drwxr-xr-x 3 root root 4096 Mar 14 03:00 01FY35WMH602J2XTVNCFSNKSA1
drwxr-xr-x 3 root root 4096 Mar 14 21:00 01FY53P5SKC3P95XA721ANTVPC
drwxr-xr-x 3 root root 4096 Mar 15 03:00 01FY5R9B8QZF64AENGQSNWSCHH
drwxr-xr-x 3 root root 4096 Mar 15 03:00 01FY5R9BDMRZHK5K3194JGARC4
drwxr-xr-x 3 root root 4096 Mar 15 05:00 01FY5Z52GS6937933QWXJ4CHVK
drwxr-xr-x 27 root root 4096 Mar 15 05:00 .
drwxr-xr-x 2 root root 4096 Mar 15 05:00 wal
Run this command to remove the required data.
cps@pn-control-1:/stats/prometheus-planning/2.0$ sudo rm -rf 01F380KRDADD3MMVD0VZXNZWS1 01F4VY9MBSJ68NW8V5JR4908G0 01F6G2V1A5Z97G6TV5G9R8MPXR 01F847CE2CV31EDHFJ594FE739 01F9RBY6RASP2T9G2TQJSAHKK8 01FBCGFRD30XYPKRRSWA4HNPE9
3. Run this command from DRA Primary Orchestrator to start supervisorctl in all prometheus-planning containers.
admin@orchestrator[pn-master-0]# docker exec prometheus-planning- "supervisorctl start all"
==========output from container prometheus-planning-s101===========
haproxy: started
consul: started
prometheus: started
prometheus-old: started
==========output from container prometheus-planning-s102===========
haproxy: started
consul: started
prometheus: started
prometheus-old: started
==========output from container prometheus-planning-s103===========
haproxy: started
consul: started
prometheus: started
prometheus-old: started
admin@orchestrator[pn-master-0]#
4. Verify the healthchecks of DRA Primary Orchestrator.
#show system status
#show system diagnostics | tab | exclude pass