Nexus 3500系列交換器平台系統健康狀況檢查程式

下載選項

PDF (332.7 KB)
在多種裝置上使用 Adobe Reader 檢視
ePub (85.3 KB)
在 iPhone、iPad、Android、Sony Reader 或 Windows Phone 上的各種應用程式中檢視
Mobi (Kindle) (76.1 KB)
在 Kindle 裝置或多部裝置的 Kindle 應用程式上檢視

已更新: 2014 年 2 月 14 日

文件 ID:116699

無偏見用語

本產品的文件集力求使用無偏見用語。針對本文件集的目的，無偏見係定義為未根據年齡、身心障礙、性別、種族身分、民族身分、性別傾向、社會經濟地位及交織性表示歧視的用語。由於本產品軟體使用者介面中硬式編碼的語言、根據 RFP 文件使用的語言，或引用第三方產品的語言，因此本文件中可能會出現例外狀況。深入瞭解思科如何使用包容性用語。

關於此翻譯

思科已使用電腦和人工技術翻譯本文件，讓全世界的使用者能夠以自己的語言理解支援內容。請注意，即使是最佳機器翻譯，也不如專業譯者翻譯的內容準確。Cisco Systems, Inc. 對這些翻譯的準確度概不負責，並建議一律查看原始英文文件（提供連結）。

簡介

本檔案介紹在執行Nexus作業系統(NX-OS)版本6.0(2)的Cisco Nexus 3500系列交換器平台上執行系統健康狀況檢查所使用的一般程序。

監控CPU和記憶體使用情況

若要接收系統的CPU和記憶體使用情況的概述，請輸入show system resources命令：

switch# show system resources 

Load average:   1 minute: 0.32   5 minutes: 0.13   15 minutes: 0.10
Processes   :   366 total, 2 running
CPU states  :   5.5% user,   12.0% kernel,   82.5% idle
        CPU0 states  :   10.0% user,   18.0% kernel,   72.0% idle
        CPU1 states  :   1.0% user,   6.0% kernel,   93.0% idle
Memory usage:   4117064K total,   2614356K used,   1502708K free
Switch#

如果需要有關消耗CPU週期或記憶體的進程的更多詳細資訊，請輸入show process cpu sort和show system internal kernel memory usage命令：

switch# show process cpu sort
PID    Runtime(ms)  Invoked   uSecs  1Sec    Process
-----  -----------  --------  -----  ------  -----------
 3239     55236684  24663045   2239    6.3%  mtc_usd
 3376          776      7007    110    2.7%  netstack
   15     26592500 178719270    148    0.9%  kacpid
 3441      4173060  29561656    141    0.9%  cfs
 3445      7646439   6391217   1196    0.9%  lacp
 3507     13646757  34821232    391    0.9%  hsrp_engine
    1        80564    596043    135    0.0%  init
    2            6       302     20    0.0%  kthreadd
    3         1064    110904      9    0.0%  migration/0
<snip>

switch# show system internal kernel memory usage 
MemTotal:      4117064 kB
MemFree:       1490120 kB
Buffers:           332 kB
Cached:        1437168 kB
ShmFS:         1432684 kB
Allowed:       1029266 Pages
Free:           372530 Pages
Available:      375551 Pages
SwapCached:          0 kB
Active:        1355724 kB
Inactive:       925400 kB
HighTotal:     2394400 kB
HighFree:       135804 kB
LowTotal:      1722664 kB
LowFree:       1354316 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:              12 kB
Writeback:           0 kB
AnonPages:      843624 kB
Mapped:         211144 kB
Slab:            98524 kB
SReclaimable:     7268 kB
SUnreclaim:      91256 kB
PageTables:      19604 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
WritebackTmp:        0 kB
CommitLimit:   2058532 kB
Committed_AS: 10544480 kB
VmallocTotal:   284664 kB
VmallocUsed:    174444 kB
VmallocChunk:   108732 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:     2048 kB
DirectMap4k:      2048 kB
DirectMap2M:   1787904 kB
switch#

輸出顯示，High記憶體區域由NX-OS使用，而Low記憶體區域由核心使用。MemTotal和MemFree值提供交換機可用的總記憶體。

若要產生記憶體使用率警示，請設定與以下類似的交換器：

switch(config)# system memory-thresholds minor 50 severe 70 critical 90

附註：在本檔案中，50、70和90僅用作範例；根據需要選擇閾值限制。

檢查硬體診斷狀態

若要檢查硬體診斷狀態，請輸入show diagnostic result all 命令。確保所有測試均通過，並且整體診斷結果為PASS。

switch# show diagnostic result all 

Current bootup diagnostic level: complete
Module 1: 48x10GE Supervisor  SerialNo : <serial #>
  Overall Diagnostic Result for Module 1 : PASS
  Diagnostic level at card bootup: complete

  Test results: (. = Pass, F = Fail, I = Incomplete, U = Untested, A = Abort)

     1) TestUSBFlash ------------------------> .
     2) TestSPROM ---------------------------> .
     3) TestPCIe ----------------------------> .
     4) TestLED -----------------------------> .
     5) TestOBFL ----------------------------> .
     6) TestNVRAM ---------------------------> .
     7) TestPowerSupply ---------------------> .
     8) TestTemperatureSensor ---------------> .
     9) TestFan -----------------------------> .
    10) TestVoltage -------------------------> .
    11) TestGPIO ----------------------------> .
    12) TestInbandPort ----------------------> .
    13) TestManagementPort ------------------> .
    14) TestMemory --------------------------> .
    15) TestForwardingEngine ----------------> .
<snip>

檢視硬體配置檔案

輸入show hardware profile status命令以檢查交換器上設定的目前硬體設定檔以及硬體表使用情況：

switch# show hardware profile status 
 Hardware table usage:
Max Host Entries = 65535, Used = 341
Max Unicast LPM Entries = 24576, Used = 92
Max Multicast LPM Entries = 8192, Used (L2:L3) = 1836 (1:1835)
Switch#

請確保主機專案和單點傳播/多點傳送最長字首匹配(LPM)專案的使用量在指定的限制內。

附註：要獲得交換機的最佳效能，必須選擇適當的硬體配置檔案模板。

如果您希望交換機在特定閾值級別生成系統日誌，請配置如下所示的交換機：

switch(config)# hardware profile multicast syslog-threshold ?
  <1-100>  Percentage

switch(config)# hardware profile unicast syslog-threshold ?
  <1-100>  Percentage

附註：單播和組播的預設閾值都是90%。

有關更多詳細資訊，請參閱配置PIM思科文章，其中根據安裝的許可證和啟用的功能提供配置詳細資訊。此外，如果要最佳化轉發表，請參閱Cisco Nexus 3000系列交換機：瞭解、配置和調整轉發表思科文章。

主動緩衝區監控

主動緩衝區監控(ABM)提供精細的緩衝區佔用資料，可更好地洞察擁塞的熱點。此功能支援兩種操作模式：單播和組播模式。

在單點傳播模式下，ABM監控和維護每個緩衝區塊的緩衝區使用資料，以及所有48個埠的單點傳播緩衝區利用率。在Multicast模式下，它監控並維護每個緩衝區塊的緩衝區使用資料和每個緩衝區塊的組播緩衝區使用率。

附註：有關詳細資訊，請參閱Cisco Nexus 3548 Active Buffer Monitoring思科文章。文章的圖4顯示緩衝區的使用高峰期是22:15:32，一直持續到22:15:37。此外，直方圖提供了使用中突然尖峰的證據，並顯示了緩衝區的排水速度。如果存在慢速接收器（如10 Gbps接收器中的1 Gbps接收器），則為了避免丟包，必須包括類似以下的配置：hardware profile multicast slow-receiver port <x>。

監控介面計數器/統計

為了監控流量損失，請輸入show interface ethernet x/y命令。此命令的輸出提供基本流量速率資訊以及埠級丟棄/錯誤。

switch# show interface eth1/10
Ethernet1/10 is up
 Dedicated Interface 
  Belongs to Po1
  Hardware: 100/1000/10000 Ethernet, address: 30f7.0d9c.3b51
  (bia 30f7.0d9c.3b51)
  MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec
  reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA
  Port mode is trunk
  full-duplex, 10 Gb/s, media type is 10G
  Beacon is turned off
  Input flow-control is off, output flow-control is off
  Rate mode is dedicated
  Switchport monitor is off 
  EtherType is 0x8100 
  Last link flapped 3d21h
  Last clearing of "show interface" counters never
  14766 interface resets
  30 seconds input rate 47240 bits/sec, 68 packets/sec
  30 seconds output rate 3120720 bits/sec, 3069 packets/sec
  Load-Interval #2: 5 minute (300 seconds)
    input rate 50.18 Kbps, 52 pps; output rate 3.12 Mbps, 3.05 Kpps
  RX
    4485822 unicast packets  175312538 multicast packets  388443 broadcast
    packets
    180186040 input packets  9575683853 bytes
    0 jumbo packets  0 storm suppression bytes
    1 runts  0 giants  1 CRC  0 no buffer
    2 input error  0 short frame  0 overrun   0 underrun  0 ignored
    0 watchdog  0 bad etype drop  0 bad proto drop  0 if down drop
    0 input with dribble  260503 input discard
    0 Rx pause
  TX
    159370439 unicast packets  6366799906 multicast packets  1111 broadcast
    packets
    6526171456 output packets  828646014117 bytes
    0 jumbo packets
    0 output errors  0 collision  0 deferred  0 late collision
    0 lost carrier  0 no carrier  0 babble 0 output discard
    0 Tx pause

switch#

如果input或output捨棄顯示非零值，判斷捨棄的封包是否為單點傳播和/或多點傳送：

switch# show queuing interface ethernet 1/10
Ethernet1/10 queuing information:
  TX Queuing
    qos-group  sched-type  oper-bandwidth
        0       WRR            100

  RX Queuing
    Multicast statistics:
        Mcast pkts dropped                      : 0
    Unicast statistics:
    qos-group 0
    HW MTU: 1500 (1500 configured)
    drop-type: drop, xon: 0, xoff: 0
    Statistics:
        Ucast pkts dropped                      : 0
switch#

輸出表明丟棄的流量不是由於服務品質(QoS)。現在您必須檢查硬體MAC地址統計資訊：

switch# show hardware internal statistics device mac ?
  all         Show all stats
  congestion  Show congestion stats
  control     Show control stats
  errors      Show error stats
  lookup      Show lookup stats
  pktflow     Show packetflow stats
 qos         Show qos stats
  rates       Show packetflow stats
  snmp        Show snmp stats

對流量捨棄執行疑難排解時，需要檢查的關鍵選項為congestion、errors和qos。pktflow選項提供RX和TX方向的流量統計資訊，其中包含特定封包大小範圍。

switch# show hardware internal statistics device mac errors port 10
|------------------------------------------------------------------------|
| Device: L2/L3 forwarding ASIC   Role:MAC                               |
|------------------------------------------------------------------------|
Instance:0
ID   Name                                          Value              Ports
--   ----                                          -----              -----
198  MTC_MB_CRC_ERR_CNT_PORT9                      0000000000000002   10 -
508  MTC_PP_CNT_PORT1_RCODE_CHAIN3                 0000000000000002   10 -
526  MTC_RW_EG_PORT1_EG_CLB_DROP_FCNT_CHAIN3       000000000054da5a   10 -
3616 MTC_NI515_P1_CNT_TX                           0000000000000bed   10 -
6495 TTOT_OCT                                      000000000005f341   10 -
7365 RTOT                                          0000000000000034   10 -
7366 RCRC                                          0000000000000001   10 -
7374 RUNT                                          0000000000000001   10 -
9511 ROCT                                          00000000000018b9   10 -
10678 PORT_EXCEPTION_ICBL_PKT_DROP                 000000000003f997   10 -

附註：0x3f997十六進位制值等於260503(十進制格式)。

switch# show interface eth1/10

Ethernet1/10 is up
<snip>  0 input with dribble  
260503 input discard
<snip>

在輸出中，PORT_EXCEPTION_ICBL_PKT_DROP錯誤訊息表示連線埠上接收的流量具有交換器上未啟用的VLAN的Dot1Q標籤。

以下是另一個範例，其中流量捨棄是因為QoS:

switch# show interface ethernet 1/11

Ethernet1/11 is up
<snip>
  TX

<snip>
    0 output errors  0 collision  0 deferred  0 late collision
    0 lost carrier  0 no carrier  0 babble 6153699 output discard
    0 Tx pause
switch#

switch# show queuing interface ethernet 1/11

Ethernet1/11 queuing information:
  TX Queuing
    qos-group  sched-type  oper-bandwidth
        0       WRR            100

  RX Queuing
    Multicast statistics:
        Mcast pkts dropped                      : 0
    Unicast statistics:
    qos-group 0
    HW MTU: 1500 (1500 configured)
    drop-type: drop, xon: 0, xoff: 0
    Statistics:
        Ucast pkts dropped                      : 6153699

附註：輸出顯示6153699封包在Receive-direction中遭捨棄，具有誤導性。請參閱思科錯誤ID CSCuj20713。

switch# show hardware internal statistics device mac all | i 11|Port

(result filtered for relevant port)
ID   Name           Value              Ports
<snip>
5596 TX_DROP        00000000005de5e3   11 -  <--- 6153699 Tx Drops in Hex
<snip>
10253 UC_DROP_VL0   00000000005de5e3   11 -  <--- Drops for QoS Group 0 in Hex
<snip>

總而言之，以下是用來擷取封包捨棄的命令：

show interface ethernet x/y
show queuing interface ethernet x/y
show hardware internal statistics device mac errors port <port #>

監控控制平面策略統計資訊

控制階段管制(CoPP)會保護控制階段，以確保網路穩定性。有關其他詳細資訊，請參閱配置控制平面策略思科文章。

若要監控CoPP統計資訊，請輸入show policy-map interface control-plane命令：

switch# show policy-map interface control-plane 

Control Plane
  service-policy  input: copp-system-policy

    class-map copp-s-ping (match-any)
      match access-group name copp-system-acl-ping
      police pps 100 , bc 0 packets
        HW Matched Packets   30
        SW Matched Packets   30
    class-map copp-s-l3destmiss (match-any)
      police pps 100 , bc 0 packets
        HW Matched Packets   76
        SW Matched Packets   74
    class-map copp-s-glean (match-any)
      police pps 500 , bc 0 packets
        HW Matched Packets   103088
        SW Matched Packets   51544
<snip>

在輸出中，用於copp-s-ping的硬體(HW)和軟體(SW)匹配資料包是相同的。這表示由HW計數的資料包數量為30（全部傳送到帶內CPU驅動程式），SW在將這些資料包傳送到CPU之前會對相同數量的資料包進行計數。這表示CoPP沒有丟棄任何資料包，因為它在配置的限制(100 p/s)之內。

當您檢視copp-s-glean類時，該類與目的地為不存在地址解析協定(ARP)快取條目的IP地址的資料包匹配，HW看到的資料包數為103,088，而SW僅匹配51544。這表示CoPP丟棄了51544(103088-51544)資料包，因為這些資料包的速率超過500 p/s。

SW計數器從CPU帶內驅動程式獲取，HW計數器從HW中程式設計的訪問控制清單(ACL)獲取。如果您遇到HW Matched Packets等於零的情況，且SW Matched Packets存在非零值，則此特定類別對映的HW中不存在任何ACL，這是正常的。另外很重要的一點是，這兩台計數器可能不能同時輪詢，因此，只有在相差很大時，才應使用計數器值進行故障排除。

CoPP統計資訊可能與HW交換的資料包沒有直接關係，但是，如果應該通過交換機傳送的資料包被轉發到CPU，則仍然相關。封包突發是由多種原因導致的，例如當您執行快速鄰接關係時。

請注意CoPP策略有三種型別：預設為第2層(L2)和第3層(L3)。根據部署方案選擇適當的策略，並根據觀察結果修改CoPP策略。為了微調CoPP，請定期檢查，並在您獲得新服務/應用或網路重新設計之後進行檢查。

附註：若要清除計數器，請輸入clear copp statistics命令。

執行Bootflash檔案系統運行狀況檢查

若要對bootflash檔案系統執行運行狀況檢查，請輸入system health check bootflash命令：

switch# system health check bootflash 
Unmount successful...
Checking any file system errors...Please be patient...
Result: bootflash filesystem has no errors
done.
Remounting bootflash ...done.
switch#

注意：運行測試時將解除安裝檔案系統，並在測試完成後重新裝載檔案系統。確保運行測試時未訪問檔案系統。

收集系統核心和進程日誌

注意：確保系統不會遇到任何進程重置或崩潰，並且在您嘗試使用本節中提到的命令時不會生成任何核心檔案或進程日誌。

輸入以下命令以收集系統核心和進程日誌：

switch# show cores

Module  Instance  Process-name     PID       Date(Year-Month-Day Time)
------  --------  ---------------  --------  -------------------------
switch# 

switch# show process log
Process          PID     Normal-exit  Stack  Core   Log-create-time
---------------  ------  -----------  -----  -----  ---------------
ethpc            4217              N      N      N  Tue Jun  4 01:57:54 2013

附註：有關此過程的更多詳細資訊，請參閱從Cisco Nexus交換平台檢索Core檔案Cisco文章。