Introduction
This document describes how to troubleshoot Alerts generated on Common Execution Environment (CEE) with Grafana dashboard.
Alerts in CEE
Alert rules can be configured on CEE Ops Center, here is an example:
alerts rules group Pod
interval-seconds 300
rule Memory_Major
expression "(go_memstats_heap_inuse_bytes{pod=~\"rest-ep.*|smf-service.*|gtpc-ep.*|protocol.*|udp-proxy.*|cache-pod.*\"} /16000000000) >= 0.5"
duration 15m
severity major
type "Processing Error Alarm"
annotation summary
value "\"POD {{ $labels.pod }} in Namespace: {{ $labels.namespace }} has reached 50% of utilization\""
exit
exit
exit
The expression relies on PromQL, in the example it monitors go_memstats_heap_inuse_bytes(= memory usage) for each pods specified. And it calculates the memory utilization with the total memory as 16GB, and if the utilization exceeds 75% generates an alert. Generated alerts can be seen by show alerts history or show alerts active CLI.
[unknown] cee# show alerts active summary | include Memory_Major
Memory_Major 68e812264ed6 major 10-28T02:23:44 worker1 POD cache-pod-0 in Namespace: smf-data has reached 50% of utilization
Memory_Major 627af1cdd01c major 10-28T02:23:44 worker1 POD cache-pod-1 in Namespace: smf-data has reached 50% of utilization
Memory_Major 394d713e294b major 10-28T02:23:44 worker1 POD gtpc-ep-n0-0 in Namespace: smf-data has reached 50% of utilization
Memory_Major bd95b1a35ef5 major 10-28T02:23:44 worker1 POD smf-rest-ep-n0-0 in Namespace: smf-data has reached 50% of utilization
Memory_Major 57254fd42f1a major 10-28T02:23:44 worker1 POD smf-udp-proxy-0 in Namespace: smf-data has reached 50% of utilization
Memory_Major 56135a34c635 major 10-28T02:23:44 worker1 POD smf-service-n0-0 in Namespace: smf-data has reached 50% of utilization
Further explanation about Alerts can be found at this document.
Application-based Alerts
https://www.cisco.com/c/en/us/td/docs/wireless/ucc/smf/b_SMF/b_SMF_chapter_0110101.html
How to troubleshoot
The CLIs don't provide actual measured value, or trending data. The best way to troubleshoot it further is to use Grafana dashboard. As described, Alerts are defined by PromQL, hence same syntax can be applied to create graphs on Grafana.
Taking the rule as an exmaple, this syntax can be used to create a graph.
(go_memstats_heap_inuse_bytes{pod=~"rest-ep.*|smf-service.*|gtpc-ep.*|protocol.*|udp-proxy.*|cache-pod.*"}/16000000000)*100
Notes:
1. Remove "\" in the syntax which is used as escape sequence
2. Multiply by 100 to make it as percentage representation