简介
本文档介绍如何使用Grafana控制面板对通用执行环境(CEE)上生成的警报进行故障排除。
中东欧地区的警报
可以在CEE运营中心配置警报规则,以下是示例:
alerts rules group Pod
interval-seconds 300
rule Memory_Major
expression "(go_memstats_heap_inuse_bytes{pod=~\"rest-ep.*|smf-service.*|gtpc-ep.*|protocol.*|udp-proxy.*|cache-pod.*\"} /16000000000) >= 0.5"
duration 15m
severity major
type "Processing Error Alarm"
annotation summary
value "\"POD {{ $labels.pod }} in Namespace: {{ $labels.namespace }} has reached 50% of utilization\""
exit
exit
exit
该表达式依赖于PromQL,在示例中,它监控指定的每个Pod的go_memstats_heap_inuse_bytes(=内存使用)。它计算内存利用率,总内存为16GB,如果利用率超过75%,将生成警报。生成的警报可通过show alerts history或show alerts active CLI查看。
[unknown] cee# show alerts active summary | include Memory_Major
Memory_Major 68e812264ed6 major 10-28T02:23:44 worker1 POD cache-pod-0 in Namespace: smf-data has reached 50% of utilization
Memory_Major 627af1cdd01c major 10-28T02:23:44 worker1 POD cache-pod-1 in Namespace: smf-data has reached 50% of utilization
Memory_Major 394d713e294b major 10-28T02:23:44 worker1 POD gtpc-ep-n0-0 in Namespace: smf-data has reached 50% of utilization
Memory_Major bd95b1a35ef5 major 10-28T02:23:44 worker1 POD smf-rest-ep-n0-0 in Namespace: smf-data has reached 50% of utilization
Memory_Major 57254fd42f1a major 10-28T02:23:44 worker1 POD smf-udp-proxy-0 in Namespace: smf-data has reached 50% of utilization
Memory_Major 56135a34c635 major 10-28T02:23:44 worker1 POD smf-service-n0-0 in Namespace: smf-data has reached 50% of utilization
有关警报的进一步说明,请参阅本文档。
基于应用的警报
https://www.cisco.com/c/en/us/td/docs/wireless/ucc/smf/b_SMF/b_SMF_chapter_0110101.html
如何排除故障
CLI不提供实际测量值或趋势数据。进一步排除故障的最佳方法是使用Grafana控制面板。如上所述,警报由PromQL定义,因此可以应用相同的语法在Grafana上创建图形。
以规则为例,此语法可用于创建图。
(go_memstats_heap_inuse_bytes{pod=~"rest-ep.*|smf-service.*|gtpc-ep.*|protocol.*|udp-proxy.*|cache-pod.*"}/16000000000)*100
注意:
1.删除用作转义序列的语法中的“\”
2.乘以100以表示百分比