简介
本文档介绍在思科策略套件(CPS)- Diameter路由代理(DRA)中减少Prometheus磁盘空间使用的过程。
先决条件
要求
Cisco 建议您了解以下主题:
注意:思科建议您必须拥有对CPS-DRA CLI的权限管理员和cps用户访问权限。
使用的组件
本文档中的信息基于以下软件和硬件版本:
- CPS-DRA 21.1。
- 统一计算系统(UCS)-B
本文档中的信息都是基于特定实验室环境中的设备编写的。本文档中使用的所有设备最初均采用原始(默认)配置。如果您的网络处于活动状态,请确保您了解所有命令的潜在影响。
背景信息
CPS使用Prometheus和Grafana来监控关键绩效指标(KPI)和性能。
Prometheus是用于监控事件和警报的免费软件应用。它将实时度量记录在使用超文本传输协议(HTTP)拉式模型构建的时间序列数据库中(允许高维度),具有灵活的查询和实时警报。Prometheus不用作控制面板解决方案。需要将其与Grafana连接以生成控制面板。
简而言之,Prometheus是一种监控解决方案,可存储诸如指标等时间序列数据。Grafana允许可视化存储在Prometheus中的数据。
问题
由于流量激增或其他原因,可能存在Prometheus磁盘空间使用率较高的情况。每当Prometheus磁盘使用率超过70%时,必须触发警报,如果使用率达到100%, grafana将停止仪表板中数据的显示。
存储Prometheus数据的磁盘分区是/stats。
Filesystem Size Used Avail Use% Mounted on
udev 32G 0 32G 0% /dev
tmpfs 6.3G 3.1M 6.3G 1% /run
/dev/sda3 97G 13G 81G 14% /
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sda1 180M 55M 113M 33% /boot
/dev/sdb2 128G 82G 41G 68% /stats
/dev/sdb1 69G 16G 50G 25% /data
减少Prometheus磁盘空间使用的程序
方法1.临时解决方法,以平稳地修改保留时间,该保留时间可以根据给定的时间间隔自动删除最早的数据。
以下是将保留时间从8760小时修改为1小时的示例。
步骤1.从DRA Primary Orchestrator运行此命令,以停止所有prometheus-planning容器中的supervisorctl。
admin@orchestrator[pn-master-0]# docker exec prometheus-planning- "supervisorctl stop all"
==========output from container prometheus-planning-s101===========
haproxy: stopped
prometheus-old: stopped
prometheus: stopped
consul: stopped
==========output from container prometheus-planning-s102===========
haproxy: stopped
prometheus: stopped
prometheus-old: stopped
consul: stopped
==========output from container prometheus-planning-s103===========
haproxy: stopped
prometheus: stopped
prometheus-old: stopped
consul: stopped
步骤2.从DRA Primary Orchestrator运行这些命令以平稳地修改保留期。
admin@orchestrator[pn-master-0]# docker connect prometheus-planning-s101
root@prometheus-planning-s101:/# sudo sed -i 's/8760h/1h/g' /etc/supervisor/conf.d/supervisord.conf
root@prometheus-planning-s101:/# supervisorctl update all
prometheus: stopped
prometheus: updated process group
prometheus-old: stopped
prometheus-old: updated process group
root@prometheus-planning-s101:/# exit
exit
同样,对prometheus-planning-s102和prometheus-planning-s103重复此步骤。
步骤3.从DRA Primary Orchestrator运行此命令,以在所有prometheus-planning容器中启动supervisorctl。
admin@orchestrator[pn-master-0]# docker exec prometheus-planning- "supervisorctl restart all"
==========output from container prometheus-planning-s101===========
haproxy: started
consul: started
prometheus: started
prometheus-old: started
==========output from container prometheus-planning-s102===========
haproxy: started
consul: started
prometheus: started
prometheus-old: started
==========output from container prometheus-planning-s103===========
haproxy: started
consul: started
prometheus: started
prometheus-old: started
步骤4.从DRA Primary Orchestrator检验运行状况检查。
#show system status
#show system diagnostics | tab | exclude pass
#docker logs prometheus-planning-s101, docker logs prometheus-planning-s102, docker logs prometheus-planning-s103
方法2.从/stats中平稳删除数据的解决方法。
步骤1.从DRA Primary Orchestrator运行此命令,以停止所有prometheus-planning容器中的supervisorctl。
admin@orchestrator[pn-master-0]# docker exec prometheus-planning- "supervisorctl stop all"
==========output from container prometheus-planning-s101===========
haproxy: stopped
prometheus: stopped
prometheus-old: stopped
consul: stopped
==========output from container prometheus-planning-s102===========
haproxy: stopped
prometheus: stopped
prometheus-old: stopped
consul: stopped
==========output from container prometheus-planning-s103===========
haproxy: stopped
prometheus: stopped
prometheus-old: stopped
consul: stopped
admin@orchestrator[pn-master-0]#
步骤2.从Primary、control-0、control-1删除所需的旧数据(删除最旧的数据)。
在此,您可以从Primary、control-0和control-1删除2021年7月之前的数据:
以cps用户身份登录到主节点,然后进入目录/stats/prometheus-planning/2.0
首选:
cps@pn-master-0:/stats/prometheus-planning/2.0$
drwxr-xr-x 3 root root 4096 Jul 4 2021 01F9RJSYA74WMD4PR1GW4PD16S
drwxr-xr-x 3 root root 4096 Jul 24 2021 01FBCGFSX6WH8QGRDD1EFQFJTV
drwxr-xr-x 3 root root 4096 Aug 13 2021 01FD0N12YBF3VB5F37XBVW2MM5
drwxr-xr-x 3 root root 4096 Sep 3 2021 01FEMSJSEKS8ZQQSCHPSSXNBQD
drwxr-xr-x 3 root root 4096 Sep 23 09:00 01FG8Y466EPGF3BBS1W3JXQA30
drwxr-xr-x 3 root root 4096 Oct 13 15:00 01FHX2P112B5DCQYKNTNB7GB2F
drwxr-xr-x 3 root root 4096 Nov 2 21:00 01FKH77D8TMDW3XV8MQMB0641N
drwxr-xr-x 3 root root 4096 Nov 23 03:00 01FN5BRXQ5ERW24CEK7GRFJXZW
drwxr-xr-x 3 root root 4096 Dec 13 09:00 01FPSGAEST7K8K96KT616T3FTX
drwxr-xr-x 3 root root 4096 Jan 2 15:00 01FRDMW8Z9J48QM3VQ8AG2PJ63
drwxr-xr-x 3 root root 4096 Jan 22 21:00 01FT1SDKXW2E4KWJ7WFB4R80YP
drwxr-xr-x 3 root root 4096 Feb 14 05:08 01FVVA4C33QPC9P0WTH3VYXBWW
drwxr-xr-x 3 root root 4096 Mar 4 09:00 01FXA2GP1GAHHMY6GAW2SPHZMV
drwxr-xr-x 3 root root 4096 Mar 11 03:00 01FXVEPF1VK6RBKXZ3T8MX5PPY
drwxr-xr-x 3 root root 4096 Mar 13 09:00 01FY1832D9C31CYQKTZV2W17CZ
drwxr-xr-x 3 root root 4096 Mar 14 03:00 01FY35WKHJAC5AH4EZH9YWFK5B
drwxr-xr-x 3 root root 4096 Mar 14 21:00 01FY53P4QAQEDDE4DSFHRPSCM2
drwxr-xr-x 3 root root 4096 Mar 15 03:00 01FY5R9A5PA7H0M30HC44RVQBH
drwxr-xr-x 3 root root 4096 Mar 15 03:00 01FY5R9AAQBEH9PMEXF7M508XK
drwxr-xr-x 3 root root 4096 Mar 15 05:00 01FY5Z51DNMGHPRFSQ20G43ETW
drwxr-xr-x 23 root root 4096 Mar 15 05:00 .
drwxr-xr-x 2 root root 4096 Mar 15 05:00 wal
运行此命令以删除所需数据。
cps@pn-master-0:/stats/prometheus-planning/2.0$ sudo rm -rf 01FBCGFSX6WH8QGRDD1EFQFJTV 01F9RJSYA74WMD4PR1GW4PD16S
以cps用户身份登录到Control-0节点,然后进入目录/stats/prometheus-planning/2.0
CONTROL-0:
cps@pn-control-0:/stats/prometheus-planning/2.0$ ls -lrt
drwxr-xr-x 3 root root 4.0K Apr 14 2021 01F380KG1TEDJ0A4XRNDA8K8M3
drwxr-xr-x 3 root root 4.0K May 4 2021 01F4VY9BP3RMK3J2BPXSP2VQA5
drwxr-xr-x 3 root root 4.0K May 24 2021 01F6G2TTJHR1XAQBC095WN9E30
drwxr-xr-x 3 root root 4.0K Jun 14 2021 01F847CDQRHPC6SCPC659ZEFH8
drwxr-xr-x 3 root root 4.0K Jul 4 2021 01F9RBYVJSBRKAC5VFGPT9JVS1
drwxr-xr-x 3 root root 4.0K Jul 24 2021 01FBCGGD141BRR4D59Y3G9855D
drwxr-xr-x 3 root root 4.0K Aug 13 2021 01FD0N17Z3R2R2MJSG6HE1YNAT
drwxr-xr-x 3 root root 4.0K Sep 3 2021 01FEMSJZFJPCFJNENFV3JHCHCY
drwxr-xr-x 3 root root 4.0K Sep 23 09:00 01FG8Y4A0HCNVVCVNJJNGXT84Y
drwxr-xr-x 3 root root 4.0K Oct 13 15:00 01FHX2P1F8EZVC8EDQVXH4C3VW
drwxr-xr-x 3 root root 4.0K Nov 2 21:00 01FKH77FAJ0DS4P8J8QBQKTC75
drwxr-xr-x 3 root root 4.0K Nov 23 03:00 01FN5BRV1XR7NZ0SFT63M9BR4F
drwxr-xr-x 3 root root 4.0K Dec 13 09:00 01FPSGAE6ZYXQNYZYS56SFNGFX
drwxr-xr-x 3 root root 4.0K Jan 2 15:00 01FRDMW8K61Y2AK96BR5N2C1YK
drwxr-xr-x 3 root root 4.0K Jan 22 21:00 01FT1SDKW7X7HYPEKR5P76NWNB
drwxr-xr-x 3 root root 4.0K Feb 14 05:14 01FVVAEZPBT17H5XCDFC18YPQK
drwxr-xr-x 3 root root 4.0K Mar 4 09:00 01FXA2GNCMTB54N66W18NPJDBR
drwxr-xr-x 3 root root 4.0K Mar 11 03:00 01FXVEPEY54S4YEGJBPC42K01C
drwxr-xr-x 3 root root 4.0K Mar 13 09:00 01FY18329GPWH49HX61GNSDYT6
drwxr-xr-x 3 root root 4.0K Mar 14 03:00 01FY35WKDY9AZRW7DEKYS0T382
drwxr-xr-x 3 root root 4.0K Mar 14 21:00 01FY53P4P6FS9TGA1P1V1GJEQK
drwxr-xr-x 3 root root 4.0K Mar 15 03:00 01FY5R9A7KMN03DHNTW4ME1DA7
drwxr-xr-x 3 root root 4.0K Mar 15 03:00 01FY5R9ABDNV3D6P7S63S9JKF2
drwxr-xr-x 3 root root 4.0K Mar 15 05:00 01FY5Z51FQHV4GJ24D6S1G6QW9
drwxr-xr-x 27 root root 4.0K Mar 15 05:00 .
drwxr-xr-x 2 root root 4.0K Mar 15 05:00 wal
运行此命令以删除所需数据。
cps@pn-control-0:/stats/prometheus-planning/2.0$ sudo rm -rf 01F380KG1TEDJ0A4XRNDA8K8M3 01F4VY9BP3RMK3J2BPXSP2VQA5 01F6G2TTJHR1XAQBC095WN9E30 01F847CDQRHPC6SCPC659ZEFH8 01F9RBYVJSBRKAC5VFGPT9JVS1 01FBCGGD141BRR4D59Y3G9855D
以cps用户身份登录到Control-1节点,然后进入目录/stats/prometheus-planning/2.0。
CONTROL-1:
cps@pn-control-1:/stats/prometheus-planning/2.0$ ls -lart
total 108
drwxr-xr-x 3 root root 4096 Apr 14 2021 01F380KRDADD3MMVD0VZXNZWS1
drwxr-xr-x 3 root root 4096 May 4 2021 01F4VY9MBSJ68NW8V5JR4908G0
drwxr-xr-x 3 root root 4096 May 24 2021 01F6G2V1A5Z97G6TV5G9R8MPXR
drwxr-xr-x 3 root root 4096 Jun 14 2021 01F847CE2CV31EDHFJ594FE739
drwxr-xr-x 3 root root 4096 Jul 4 2021 01F9RBY6RASP2T9G2TQJSAHKK8
drwxr-xr-x 3 root root 4096 Jul 24 2021 01FBCGFRD30XYPKRRSWA4HNPE9
drwxr-xr-x 3 root root 4096 Aug 13 2021 01FD0N14GD5B5GQJ6NVBJB6E80
drwxr-xr-x 3 root root 4096 Sep 3 2021 01FEMSJVNJ1N71YPEVB41K6AFJ
drwxr-xr-x 3 root root 4096 Sep 23 09:00 01FG8Y4FJX9K27YJX4KF7FDJAF
drwxr-xr-x 3 root root 4096 Oct 13 15:00 01FHX2P2PWCZ9B665G83WG3ZNE
drwxr-xr-x 3 root root 4096 Nov 2 21:00 01FKH77M15JFZS6TAVMRP82H8T
drwxr-xr-x 3 root root 4096 Nov 23 03:00 01FN5BS5ZE2WFMJF5PE7J41GM7
drwxr-xr-x 3 root root 4096 Dec 13 09:00 01FPSGACKH8EW951GJ04RSGACD
drwxr-xr-x 3 root root 4096 Jan 2 15:00 01FRDMW9W0110PS4NTGZ7VETV7
drwxr-xr-x 3 root root 4096 Jan 22 21:00 01FT1SDKV8X6103294D4A5KDBY
drwxr-xr-x 3 root root 4096 Feb 14 05:14 01FVVAF1FN79BZNF25665987AV
drwxr-xr-x 3 root root 4096 Mar 4 09:00 01FXA2GWSKN955BW5FJ71C8D0A
drwxr-xr-x 3 root root 4096 Mar 11 03:00 01FXVEPGAHGS4JA55VF6RZTJWX
drwxr-xr-x 3 root root 4096 Mar 13 09:00 01FY1833FXKQGEAF2MNXWTW6J7
drwxr-xr-x 3 root root 4096 Mar 14 03:00 01FY35WMH602J2XTVNCFSNKSA1
drwxr-xr-x 3 root root 4096 Mar 14 21:00 01FY53P5SKC3P95XA721ANTVPC
drwxr-xr-x 3 root root 4096 Mar 15 03:00 01FY5R9B8QZF64AENGQSNWSCHH
drwxr-xr-x 3 root root 4096 Mar 15 03:00 01FY5R9BDMRZHK5K3194JGARC4
drwxr-xr-x 3 root root 4096 Mar 15 05:00 01FY5Z52GS6937933QWXJ4CHVK
drwxr-xr-x 27 root root 4096 Mar 15 05:00 .
drwxr-xr-x 2 root root 4096 Mar 15 05:00 wal
运行此命令以删除所需数据。
cps@pn-control-1:/stats/prometheus-planning/2.0$ sudo rm -rf 01F380KRDADD3MMVD0VZXNZWS1 01F4VY9MBSJ68NW8V5JR4908G0 01F6G2V1A5Z97G6TV5G9R8MPXR 01F847CE2CV31EDHFJ594FE739 01F9RBY6RASP2T9G2TQJSAHKK8 01FBCGFRD30XYPKRRSWA4HNPE9
3.从DRA Primary Orchestrator运行此命令,以在所有prometheus-planning容器中启动supervisorctl。
admin@orchestrator[pn-master-0]# docker exec prometheus-planning- "supervisorctl start all"
==========output from container prometheus-planning-s101===========
haproxy: started
consul: started
prometheus: started
prometheus-old: started
==========output from container prometheus-planning-s102===========
haproxy: started
consul: started
prometheus: started
prometheus-old: started
==========output from container prometheus-planning-s103===========
haproxy: started
consul: started
prometheus: started
prometheus-old: started
admin@orchestrator[pn-master-0]#
4.检验DRA Primary Orchestrator的运行状况检查。
#show system status
#show system diagnostics | tab | exclude pass