本產品的文件集力求使用無偏見用語。針對本文件集的目的,無偏見係定義為未根據年齡、身心障礙、性別、種族身分、民族身分、性別傾向、社會經濟地位及交織性表示歧視的用語。由於本產品軟體使用者介面中硬式編碼的語言、根據 RFP 文件使用的語言,或引用第三方產品的語言,因此本文件中可能會出現例外狀況。深入瞭解思科如何使用包容性用語。
思科已使用電腦和人工技術翻譯本文件,讓全世界的使用者能夠以自己的語言理解支援內容。請注意,即使是最佳機器翻譯,也不如專業譯者翻譯的內容準確。Cisco Systems, Inc. 對這些翻譯的準確度概不負責,並建議一律查看原始英文文件(提供連結)。
本文檔介紹補救以下故障的後續步驟:
"Code" : "F0321",
"Description" : "Controller <id> is unhealthy because: Data Layer Partially Degraded Leadership",
"Dn" : "topology/pod-<POD-ID>/node-<NODE-ID>/av/node-<NODE-ID>/fault-F0321",
"Code" : "F0321",
"Description" : "Controller 3 is unhealthy because: Data Layer Partially Diverged"
"Dn" : "topology/pod-<POD-ID>/node-<NODE-ID>/av/node-<NODE-ID>/fault-F0321",
"Code" : "F0325",
"Description" : "Connectivity has been lost to the leader for some data subset(s) of a service on <node >, the service may have unexpectedly restarted or failed",
"Dn" : "topology/pod-<POD-ID>/node-<NODE-ID>/av/node-<NODE-ID>/fault-F0325",
"Code" : "F0323",
"Description" : "Lost connectivity to leader for some data subset(s) of Access <Service> on <controller >",
"Dn" : "topology/pod-<POD-ID>/node-<NODE-ID>/av/node-<NODE-ID>/fault-F0323",
如果您有與Intersight連線的ACI交換矩陣,則會代表您生成服務請求,以指明在與Intersight連線的ACI交換矩陣中發現了此故障的例項。
當APIC集群不正常時,會引發此特定故障。當分片/複製副本中的一個關閉(在「acidiag rvread output」中以「\」表示)時,會顯示「資料層部分差異」。當以「X」表示的APIC中完全缺少副本或資料庫時,也會出現此故障。 我們需要解決任何潛在問題,並恢復群集的運行狀況。
如果交換矩陣用於生產,請勿嘗試任何侵入性步驟對集群問題進行故障排除,例如關閉或重新載入電源或者降級。收集並上傳TS檔案到TAC案例以瞭解恢復APIC集群的確切步驟。
透過運行此命令,它將執行多項檢查,包括與APIC的連線。我們應該看到所有的測試結果都恢復正常。如果我們發現OK以外的任何東西,我們就需要調查它的原因。
######## Sample output on a healthy cluster ########
apic1# acidiag cluster
Admin password:
Running...
Checking Wiring and UUID: OK
Checking AD Processes: Running
Checking All Apics in Commission State: OK
Checking All Apics in Active State: OK
Checking Fabric Nodes: OK
Checking Apic Fully-Fit: OK
Checking Shard Convergence: OK
Checking Leadership Degration: Optimal leader for all shards
Ping OOB IPs:
APIC-1: 10.197.204.149 - OK
APIC-2: 10.197.204.150 - OK
APIC-3: 10.197.204.151 - OK
Ping Infra IPs:
APIC-1: 10.0.0.1 - OK
APIC-2: 10.0.0.2 - OK
APIC-3: 10.0.0.3 - OK
Checking APIC Versions: Same (5.2(4d))
Checking SSL: OK
Full file system(s): None
Done!
######## Sample output on a unhealthy cluster ########
apic1# acidiag cluster
Admin password:
Running...
Checking Wiring and UUID: switch(302) reports apic(3) has wireIssue: unapproved-ctrlr
Checking AD Processes: Running
Checking All Apics in Commission State: OK
Checking All Apics in Active State: OK
Checking Fabric Nodes: OK
Checking Apic Fully-Fit: OK
Checking Shard Convergence: OK
Checking Leadership Degration: Non optimal leader for shards : 3:1,3:2,3:4,3:5,3:7,3:8,3:10,3:11,3:13,3:14,3:16,3:17,3:19,3:20,3:22,3:23,3:25,3:26,3:28,3:29,3:31,3:32,6:1,6:2,6:4,6:5,6:7,6:8,6:10,6:11,6:13,6:14,6:16,6:17,6:19,6:20,6:22,6:23,6:25,6:26,6:28,6:29,6:31,6:32,9:1,9:2,9:4,9:5,9:7,9:8,9:10,9:11,9:13,9:14,9:16,9:17,9:19,9:20,9:22,9:23,9:25,9:26,9:28,9:29,9:31,9:32,10:1,10:2,10:4,10:5,10:7,10:8,10:10,10:11,10:13,10:14,10:16,10:17,10:19,10:20,10:22,10:23,10:25,10:26,10:28,10:29,10:31,10:32,11:1,11:2,11:4,11:5,11:7,11:8,11:10,11:11,11:13,11:14,11:16,11:17,11:19,11:20,11:22,11:23,11:25,11:26,11:28,11:29,11:31,11:32,14:1,14:2,14:4,14:5,14:7,14:8,14:10,14:11,14:13,14:14,14:16,14:17,14:19,14:20,14:22,14:23,14:25,14:26,14:28,14:29,14:31,14:32,16:1,16:2,16:4,16:5,16:7,16:8,16:10,16:11,16:13,16:14,16:16,16:17,16:19,16:20,16:22,16:23,16:25,16:26,16:28,16:29,16:31,16:32,22:1,22:2,22:4,22:5,22:7,22:8,22:10,22:11,22:13,22:14,22:16,22:17,22:19,22:20,22:22,22:23,22:25,22:26,22:28,22:29,22:31,22:32,23:1,23:2,23:4,23:5,23:7,23:8,23:10,23:11,23:13,23:14,23:16,23:17,23:19,23:20,23:22,23:23,23:25,23:26,23:28,23:29,23:31,23:32,33:1,34:1,34:2,34:4,34:5,34:7,34:8,34:10,34:11,34:13,34:14,34:16,34:17,34:19,34:20,34:22,34:23,34:25,34:26,34:28,34:29,34:31,34:32,35:1,35:2,35:4,35:5,35:7,35:8,35:10,35:11,35:13,35:14,35:16,35:17,35:19,35:20,35:22,35:23,35:25,35:26,35:28,35:29,35:31,35:32,36:1,39:1,39:2,39:4,39:5,39:7,39:8,39:10,39:11,39:13,39:14,39:16,39:17,39:19,39:20,39:22,39:23,39:25,39:26,39:28,39:29,39:31,39:32
Ping OOB IPs:
APIC-1: 10.197.204.184 - OK
APIC-2: 10.197.204.185 - OK
APIC-3: 10.197.204.186 - OK
Ping Infra IPs:
APIC-1: 10.0.0.1 - OK
APIC-2: 10.0.0.2 - OK
APIC-3: 10.0.0.3 - OK
Checking APIC Versions: Same (5.2(3e))
Checking SSL: OK
Full file system(s): None
Done!
確保APIC SSD運行正常,並且未在ACI交換矩陣- F2730、F2731和F2732上引發這些故障之一。以下是在APIC CLI上運行的命令,用於查詢是否存在這些故障或是否可在GUI上驗證這些故障(System > Faults)
##### Example:
# faultRecord
ack : no
cause : equipment-wearout
changeSet : available:unspecified, blocks:unspecified, capUtilized:0, device:Solid State Device, fileSystem:/dev/sdb, firmwareVersion:Dxxxxxxx, mediaWearout:1, model:INTEL SSDSC2BB120G4, mount:/dev/sdb, name:/dev/sdb, operSt:ok, serial:ABCDxxxxxxxxxxxXYZ, used:unspecified
childAction :
code : F2730
created : 2022-01-10T03:13:08.026+00:00
delegated : no
descr : Storage unit /dev/sdb on Node 3 with hostname apic1.cisco.com mounted at /dev/sdb has 1% life remaining
dn : topology/pod-2/node-3/sys/ch/p-[/dev/sdb]-f-[/dev/sdb]/fault-F2730
domain : infra
highestSeverity : warning
lastTransition : 2022-01-10T03:13:08.026+00:00
lc : raised
occur : 1
origSeverity : warning
prevSeverity : warning
rule : eqpt-storage-wearout-warning
severity : warning
status :
subject : equipment-wearout
type : operational
# faultRecord
ack : no
cause : equipment-wearout
changeSet : available:unspecified, blocks:unspecified, capUtilized:0, device:Solid State Device, fileSystem:/dev/sdb, firmwareVersion:Dxxxxxxx, mediaWearout:1, model:INTEL SSDSC2BB120G4, mount:/dev/sdb, name:/dev/sdb, operSt:ok, serial:ABCDxxxxxxxxxxxXYZ, used:unspecified
childAction :
code : F2731
created : 2022-01-10T03:13:08.026+00:00
delegated : no
descr : Storage unit /dev/sdb on Node 3 mounted at /dev/sdb has 1% life remaining
dn : topology/pod-2/node-3/sys/ch/p-[/dev/sdb]-f-[/dev/sdb]/fault-F2731
domain : infra
highestSeverity : major
lastTransition : 2022-01-10T03:13:08.026+00:00
lc : raised
occur : 1
origSeverity : major
prevSeverity : major
rule : eqpt-storage-wearout-major
severity : major
status :
subject : equipment-wearout
type : operational
檢查是否所有DME處理序都在執行
運行ps -aux | egrep「svc|nginx.bin|dhcp」
預期輸出如下:
apic1# ps -ef | egrep "svc|nginx.bin|dhcp"
root 3063 1 5 22:08 ? 00:04:40 /mgmt//bin/nginx.bin -p /data//nginx/
root 8889 1 7 21:53 ? 00:06:43 /mgmt//bin/svc_ifc_appliancedirector.bin --x
ifc 8891 1 1 21:53 ? 00:01:29 /mgmt//bin/svc_ifc_policydist.bin --x
root 8893 1 2 21:53 ? 00:02:28 /mgmt//bin/svc_ifc_bootmgr.bin --x
ifc 8894 1 1 21:53 ? 00:01:41 /mgmt//bin/svc_ifc_vmmmgr.bin --x
ifc 8895 1 2 21:53 ? 00:02:14 /mgmt//bin/svc_ifc_topomgr.bin --x
ifc 8901 1 2 21:53 ? 00:02:22 /mgmt//bin/svc_ifc_observer.bin --x
root 8903 1 1 21:53 ? 00:01:40 /mgmt//bin/svc_ifc_plgnhandler.bin --x
ifc 8914 1 1 21:53 ? 00:01:34 /mgmt//bin/svc_ifc_domainmgr.bin --x
ifc 8915 1 2 21:53 ? 00:02:04 /mgmt//bin/svc_ifc_dbgr.bin --x
ifc 8917 1 1 21:53 ? 00:01:34 /mgmt//bin/svc_ifc_edmgr.bin --x
ifc 8918 1 1 21:53 ? 00:01:22 /mgmt//bin/svc_ifc_vtap.bin --x
ifc 8922 1 2 21:53 ? 00:02:09 /mgmt//bin/svc_ifc_eventmgr.bin --x
ifc 8925 1 3 21:53 ? 00:03:15 /mgmt//bin/svc_ifc_reader.bin --x
ifc 8929 1 1 21:53 ? 00:01:34 /mgmt//bin/svc_ifc_idmgr.bin --x
ifc 8930 1 1 21:53 ? 00:01:26 /mgmt//bin/svc_ifc_licensemgr.bin --x
ifc 8937 1 3 21:53 ? 00:03:18 /mgmt//bin/svc_ifc_policymgr.bin --x
ifc 8941 1 1 21:53 ? 00:01:34 /mgmt//bin/svc_ifc_scripthandler.bin --x
root 11157 1 1 21:54 ? 00:01:29 /mgmt//bin/dhcpd.bin -f -4 -cf /data//dhcp/dhcpd.conf -lf /data//dhcp/dhcpd.lease -pf /var/run//dhcpd.pid --no-pid bond0.3902
root 11170 1 4 21:54 ? 00:04:15 /mgmt//bin/svc_ifc_ae.bin --x
admin 17094 16553 0 23:27 pts/0 00:00:00 grep -E svc|nginx.bin|dhcp
您可以檢查故障DME的故障代碼F1419。
apic1# show faults code F1419 history
ID : 4294971876
Description : Service policymgr failed on apic bgl-aci02-apic1 of fabric
POD02 with a hostname bgl-aci02-apic1
Severity : major
DN : subj-[topology/pod-1/node-1/sys/proc/proc-
policymgr]/fr-4294971876
Created : 2022-03-21T18:29:20.570+12:00
Code : F1419
Type : operational
Cause : service-failed
Change Set : id (Old: 5152, New: 0), maxMemAlloc (Old: 1150246912, New:
0), operState (Old: up, New: down)
Action : creation
Domain : infra
Life Cycle : soaking
Count Fault Occurred : 1
Acknowledgement Status : no
如果apic之間連線中斷,原因之一可能是佈線問題。Acidiag Cluster命令還將顯示鏈路上存在哪種型別的佈線問題。以下是所有可能的接線問題:
ctrlr-uuid-mismatch - APIC UUID不匹配(重複的APIC ID)
fabric-domain-mismatch -相鄰節點屬於不同交換矩陣
wiring-mismatch -無效連線(枝葉到枝葉,主幹到非枝葉,枝葉交換矩陣埠到非主乾等)
adajeceny-not-detected -矩陣埠上無LLDP鄰接關係
infra-vlan-mismatch -枝葉和APIC之間的基礎設施VLAN不匹配。
pod-id-mismatch - APIC和枝葉之間的Pod ID不匹配
unapproved-ctrlr - APIC與連線的枝葉之間的SSL握手未完成。
unapproved-serialnumber -檢測到不在Apic的資料庫中的節點。
如果DME process status部分的輸出與預期輸出不匹配。嘗試使用'acidiag start <DME>'來啟動DME,例如,如果svc_ifc_eventmgr遺失,請嘗試使用'acidiag start eventmgr'
apic1# ps -aux | egrep "svc|nginx.bin|dhcp"
root 5112 7.3 0.4 1033952 323180 ? Ssl Mar10 3073:27 /mgmt//bin/svc_ifc_appliancedirector.bin --x
ifc 5117 1.7 0.6 1062664 439876 ? Ssl Mar10 720:52 /mgmt//bin/svc_ifc_topomgr.bin --x
ifc 5118 2.1 2.2 2164512 1468200 ? Ssl Mar10 884:11 /mgmt//bin/svc_ifc_policymgr.bin --x
ifc 5119 1.5 0.3 1115984 256904 ? Ssl Mar10 664:51 /mgmt//bin/svc_ifc_licensemgr.bin --x
ifc 5120 1.5 0.5 1088252 356760 ? Ssl Mar10 666:26 /mgmt//bin/svc_ifc_edmgr.bin --x
root 5121 1.6 0.6 1125948 423392 ? Ssl Mar10 698:11 /mgmt//bin/svc_ifc_bootmgr.bin --x
ifc 5123 2.3 1.2 1474388 800564 ? Ssl Mar10 994:15 /mgmt//bin/svc_ifc_eventmgr.bin --x
ifc 5126 1.5 8.2 6032524 5363184 ? Ssl Mar10 635:58 /mgmt//bin/svc_ifc_reader.bin --x
root 5130 4.6 0.6 1092480 439580 ? Ssl Mar10 1927:08 /mgmt//bin/svc_ifc_ae.bin --x
ifc 5132 1.6 0.8 1312136 567420 ? Ssl Mar10 689:43 /mgmt//bin/svc_ifc_vmmmgr.bin --x
ifc 5133 1.5 0.5 1064176 346760 ? Ssl Mar10 659:03 /mgmt//bin/svc_ifc_domainmgr.bin --x
ifc 5135 1.8 1.6 1736876 1099924 ? Ssl Mar10 770:39 /mgmt//bin/svc_ifc_observer.bin --x
root 5141 1.5 0.7 1092948 458156 ? Ssl Mar10 663:41 /mgmt//bin/svc_ifc_plgnhandler.bin --x
ifc 5146 2.0 0.6 1037676 397236 ? Ssl Mar10 857:43 /mgmt//bin/svc_ifc_idmgr.bin --x
ifc 5148 1.3 0.3 650596 222336 ? Ssl Mar10 580:25 /mgmt//bin/svc_ifc_vtap.bin --x
ifc 5160 1.6 0.6 1098280 453492 ? Ssl Mar10 669:17 /mgmt//bin/svc_ifc_scripthandler.bin --x
root 7089 1.4 0.4 856360 315016 ? Ssl Mar10 592:04 /mgmt//bin/dhcpd.bin -f -4 -cf /data//dhcp/dhcpd.conf -lf /data//dhcp/dhcpd.lease -pf /var/run//dhcpd.pid --no-pid bond0.3903
admin 29834 0.0 0.0 112800 1780 pts/1 S+ 17:22 0:00 grep -E svc|nginx.bin|dhcp
ifc 30432 1.4 0.6 894088 405968 ? Ssl Mar17 473:45 /mgmt//bin/svc_ifc_policydist.bin --x
root 31215 2.8 5.2 4503880 3397276 ? Ssl Apr05 124:08 /mgmt//bin/nginx.bin -p /data//nginx/
在上述輸出中,與DME進程狀態部分中提及的預期輸出相比,缺少svc_ifc_dbgr.bin。我們可以使用「acidiag restart dbgr」啟動此過程
apic1# acidiag start dbgr
apic1# ps -aux | egrep "svc|nginx.bin|dhcp"
root 5112 7.3 0.4 1033952 323240 ? Ssl Mar10 3073:43 /mgmt//bin/svc_ifc_appliancedirector.bin --x
ifc 5117 1.7 0.6 1062664 439876 ? Ssl Mar10 720:56 /mgmt//bin/svc_ifc_topomgr.bin --x
ifc 5118 2.1 2.2 2164512 1468200 ? Ssl Mar10 884:16 /mgmt//bin/svc_ifc_policymgr.bin --x
ifc 5119 1.5 0.3 1115984 256904 ? Ssl Mar10 664:55 /mgmt//bin/svc_ifc_licensemgr.bin --x
ifc 5120 1.5 0.5 1088252 356760 ? Ssl Mar10 666:30 /mgmt//bin/svc_ifc_edmgr.bin --x
root 5121 1.6 0.6 1125948 423392 ? Ssl Mar10 698:15 /mgmt//bin/svc_ifc_bootmgr.bin --x
ifc 5123 2.3 1.2 1474388 800784 ? Ssl Mar10 994:21 /mgmt//bin/svc_ifc_eventmgr.bin --x
ifc 5126 1.5 8.2 6032524 5363184 ? Ssl Mar10 636:01 /mgmt//bin/svc_ifc_reader.bin --x
root 5130 4.6 0.6 1092480 439580 ? Ssl Mar10 1927:18 /mgmt//bin/svc_ifc_ae.bin --x
ifc 5132 1.6 0.8 1312136 567420 ? Ssl Mar10 689:46 /mgmt//bin/svc_ifc_vmmmgr.bin --x
ifc 5133 1.5 0.5 1064176 346760 ? Ssl Mar10 659:07 /mgmt//bin/svc_ifc_domainmgr.bin --x
ifc 5135 1.8 1.6 1736876 1099924 ? Ssl Mar10 770:43 /mgmt//bin/svc_ifc_observer.bin --x
root 5141 1.5 0.7 1092948 458156 ? Ssl Mar10 663:45 /mgmt//bin/svc_ifc_plgnhandler.bin --x
ifc 5146 2.0 0.6 1037676 397236 ? Ssl Mar10 857:48 /mgmt//bin/svc_ifc_idmgr.bin --x
ifc 5148 1.3 0.3 650596 222336 ? Ssl Mar10 580:28 /mgmt//bin/svc_ifc_vtap.bin --x
ifc 5160 1.6 0.6 1098280 453492 ? Ssl Mar10 669:21 /mgmt//bin/svc_ifc_scripthandler.bin --x
root 7089 1.4 0.4 856360 315016 ? Ssl Mar10 592:07 /mgmt//bin/dhcpd.bin -f -4 -cf /data//dhcp/dhcpd.conf -lf /data//dhcp/dhcpd.lease -pf /var/run//dhcpd.pid --no-pid bond0.3903
ifc 7609 126 0.5 987404 362824 ? Ssl 17:25 0:02 /mgmt//bin/svc_ifc_dbgr.bin --x <=====
admin 7762 0.0 0.0 112800 1668 pts/1 S+ 17:26 0:00 grep -E svc|nginx.bin|dhcp
ifc 30432 1.4 0.6 894088 405968 ? Ssl Mar17 473:48 /mgmt//bin/svc_ifc_policydist.bin --x
root 31215 2.8 5.2 4503880 3397252 ? Ssl Apr05 124:13 /mgmt//bin/nginx.bin -p /data//nginx/
運行「Acidiag start dbgr」後,進程再次啟動。如果您沒有看到進程入門,請聯絡TAC以進行進一步的故障排除。
如果有任何核心檔案存在,請執行show core將其上傳到SR。
apic1# show core
Node Module Creation-Time File-Size Service Process Original-Location Exit-Code Death-Reason Last-Heartbeat
---- ------ ------------- --------- ------------ ------- ------------------ --------- ------------ --------------
Ctrlr-Id Creation-Time File-Size Service Process Original-Location Exit-Code
-------- --------------------- --------- ------------ ------- ---------------------------------------- ---------
1 2021-10-05T21:19:55.0 204534444 eventmgr 22453 /dmecores/svc_ifc_eventmgr.bin_log.22453 134
00-07:00 .tar.gz
捕獲APIC TS日誌並上傳到SR,以進行進一步的故障排除。https://www.cisco.com/c/en/us/support/docs/cloud-systems-management/application-policy-infrastructure-controller-apic/214520-guide-to-collect-tech-support-and-tac-re.html
修訂 | 發佈日期 | 意見 |
---|---|---|
1.0 |
06-Apr-2022 |
初始版本 |