簡介
本文檔介紹在使用者網路中發現的節點匯出器磁碟滿問題。
背景
當對Cluster Manager Common Execution Environment(CEE)執行稽核時,稽核結果將指示節點匯出器磁碟已滿。
問題
存在嚴重性嚴重性嚴重性警報情況,因為預計在未來24小時內會出現磁碟已滿情況,所以在CEE上發現此警報:
"node-exporter cee03/node-exporter-4dd4a4dd4a的裝置/dev/sda3預計在未來24小時內已滿"
分析
報告的警報在CEE上,用於跟蹤機架的硬體問題,並預測在未來24小時內發生完全磁碟的情況。
cisco@deployer-cm-primary:~$ kubectl get pods -A -o wide | grep node
cee03 node-exporter-4dd4a4dd4a 1/1 Running 1 111d 10.10.1.1 deployer-cm-primary <none> <none>
root@deployer-cm-primary:/# df -h
Filesystem Size Used Avail Use% Mounted on
overlay 568G 171G 368G 32% /
tmpfs 64M 0 64M 0% /dev
tmpfs 189G 0 189G 0% /sys/fs/cgroup
tmpfs 189G 0 189G 0% /host/sys/fs/cgroup
/dev/sda1 9.8G 3.5G 5.9G 37% /host/root
udev 189G 0 189G 0% /host/root/dev
tmpfs 189G 0 189G 0% /host/root/dev/shm
tmpfs 38G 15M 38G 1% /host/root/run
tmpfs 5.0M 0 5.0M 0% /host/root/run/lock
/dev/sda3 71G 67G 435M 100% /host/root/var/log
執行審計時,它似乎已填滿/dev/sda3磁碟。
root@deployer-cm-primary:/host/root/var/log# du -h --max-depth=1
76M ./sysstat
16K ./lost+found
4.0K ./containers
4.0K ./landscape
9.3M ./calico
1.1G ./apiserver
808K ./pods
5.6G ./journal
60G ./audit
36K ./apt
67G .
對稽核的檢查表明它保留日誌,因此可能會發生匯出器節點磁碟已滿的伺服器情況。
cisco@deployer-cm-primary:~$ sudo cat /etc/audit/auditd.conf
#
# This file controls the configuration of the audit daemon
#
local_events = yes
write_logs = yes
log_file = /var/log/audit/audit.log
log_group = adm
log_format = RAW
flush = INCREMENTAL_ASYNC
freq = 50
max_log_file = 8
num_logs = 5
priority_boost = 4
disp_qos = lossy
dispatcher = /sbin/audispd
name_format = NONE
##name = mydomain
max_log_file_action = keep_logs
space_left = 75
space_left_action = email
verify_email = yes
action_mail_acct = root
admin_space_left = 50
admin_space_left_action = halt
disk_full_action = SUSPEND
disk_error_action = SUSPEND
use_libwrap = yes
##tcp_listen_port = 60
tcp_listen_queue = 5
tcp_max_per_addr = 1
##tcp_client_ports = 1024-65535
tcp_client_max_idle = 0
enable_krb5 = no
krb5_principal = auditd
##krb5_key_file = /etc/audit/audit.key
distribute_network = no
cisco@deployer-cm-primary:~$
解決方案
在deployer-cm-primary和deployer-cm-secondary上執行下面列出的命令代碼,以修復可能的節點 — 匯出器磁碟已滿情況。
sudo vim /etc/audit/auditd.conf
然後,使用旁邊列出的代碼將內部檔案從keep_logs更改為旋轉。
max_log_file_action = rotate
更改代碼後,重新啟動服務。
sudo systemctl restart auditd.service
驗證是否已刪除嚴重警報。