本產品的文件集力求使用無偏見用語。針對本文件集的目的,無偏見係定義為未根據年齡、身心障礙、性別、種族身分、民族身分、性別傾向、社會經濟地位及交織性表示歧視的用語。由於本產品軟體使用者介面中硬式編碼的語言、根據 RFP 文件使用的語言,或引用第三方產品的語言,因此本文件中可能會出現例外狀況。深入瞭解思科如何使用包容性用語。
思科已使用電腦和人工技術翻譯本文件,讓全世界的使用者能夠以自己的語言理解支援內容。請注意,即使是最佳機器翻譯,也不如專業譯者翻譯的內容準確。Cisco Systems, Inc. 對這些翻譯的準確度概不負責,並建議一律查看原始英文文件(提供連結)。
本檔案介紹思科聚合服務路由器(ASR)5500上的光纖和儲存卡(FSC)上的特定硬體問題。
******** Show alarm outstanding verbose ******* Severity Object Timestamp Alarm ID -------- ---------- ---------------------------------- --------------------- Alarm Details ------------------------------------------------------------------------------------------------------------------------------------ Minor Card 14 Sunday August 21 07:16:34 A 3610743104839221248 The Fabric & 2x200GB Storage Card in slot 14 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed. Minor Card 15 Sunday August 21 07:16:34 A 3610743104839221249 The Fabric & 2x200GB Storage Card in slot 15 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed. Minor Card 17 Sunday August 21 07:16:34 A 3610743104839221250 The Fabric & 2x200GB Storage Card in slot 17 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed.
******** show card table all ******* Slot Card Type Oper State SPOF Attach ----------- -------------------------------------- ------------- ---- ------ 1: DPC Data Processing Card Active No 2: DPC Data Processing Card Active No 3: DPC Data Processing Card Active No 4: DPC Data Processing Card Active No 5: MMIO Management & 20x10Gb I/O Card Active No 6: MMIO Management & 20x10Gb I/O Card Standby - 7: DPC Data Processing Card Active No 8: DPC Data Processing Card Active No 9: DPC Data Processing Card Active No 10: DPC Data Processing Card Standby - 11: SSC System Status Card Active No 12: SSC System Status Card Active No 13: FSC None - - 14: FSC Fabric & 2x200GB Storage Card Active Yes 15: FSC Fabric & 2x200GB Storage Card Active Yes 16: FSC Fabric & 2x200GB Storage Card Active No 17: FSC Fabric & 2x200GB Storage Card Active Yes
******** show hd raid verbose ******* HD RAID: State : Available (active) Degraded : Yes <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< UUID : 59a7ebf0:7f798af6:68869614:3210b2c6 Size : 1.2TB (1200000073728 bytes) Action : Idle Card 16 State : Faulty card <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Description : FSC16 SAD17160089 Size : 400GB (400096755712 bytes) Disk hd16a State : In-sync component Created : Fri May 23 23:05:53 2014 Updated : Fri May 23 23:05:53 2014 Events : 0 Model : STEC Z16IZF2E-200UCU E4TC Serial Number : STM000171E75 Size : 200GB (200049647616 bytes) Disk hd16b State : In-sync component Created : Fri May 23 23:05:53 2014 Updated : Fri May 23 23:05:53 2014 Events : 0 Model : STEC Z16IZF2E-200UCU E4TC Serial Number : STM000171E8B Size : 200GB (200049647616 bytes)
使用syslogs啟動故障排除時,將報告這些錯誤。
[local-60sec34.188] [hdctrl 132011 critical] [5/0/7135 <hdctrl:0> rl_fsm_mirror.c:5938] [software internal system critical-info syslog] hd16, FSC16 SAD17160089 failed from RAID 59a7ebf0:7f798af6:68869614:3210b2c6, MIO5 SAD1716021N. [local-60sec34.188] [hdctrl 132016 error] [5/0/7135 <hdctrl:0> rl_fsm_mirror.c:3399] [software internal system critical-info syslog] Error detected on hd16a (STEC Z16IZF2E-200UCU E4TC STM000171E75), FSC16 SAD17160089: ioerr_cnt increased from 25 to 27 [local-60sec34.221] [alarmctrl 65201 info] [5/0/7072 <evlogd:0> alarmctrl.c:192] [software internal system critical-info syslog] Alarm condition: id 321befb92b220000 (Minor): The Fabric & 2x200GB Storage Card in slot 14 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed. [local-60sec34.222] [alarmctrl 65201 info] [5/0/7072 <evlogd:0> alarmctrl.c:192] [software internal system critical-info syslog] Alarm condition: id 321befb92b220002 (Minor): The Fabric & 2x200GB Storage Card in slot 17 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed. [local-60sec34.222] [alarmctrl 65201 info] [5/0/7072 <evlogd:0> alarmctrl.c:192] [software internal system critical-info syslog] Alarm condition: id 321befb92b220001 (Minor): The Fabric & 2x200GB Storage Card in slot 15 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed.
如果您看到以下錯誤日誌:
Aug 21 07:16:34 evlogd: [local-60sec34.188] [hdctrl 132016 error] [5/0/7135 <hdctrl:0> rl_fsm_mirror.c:3399] [software internal system critical-info syslog] Error detected on hd16a (STEC Z16IZF2E-200UCU E4TC STM000171E75), FSC16 SAD17160089: ioerr_cnt increased from 25 to 27
建議運行智慧測試並檢查此問題是否與高溫無關。
[local]# show hd smart hd16a smartctl 6.1 2013-03-16 r3800 [x86_64-linux-2.6.38-staros-v3-hw-64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: STEC Product: Z16IZF2E-200UCU Revision: E4TC User Capacity: 200,049,647,616 bytes [200 GB] Logical block size: 512 bytes LU is resource provisioned, LBPRZ=1 Rotation Rate: Solid State Device Form Factor: 2.5 inches Logical Unit id: 0x5000a72030079304 Serial number: STM000171E75 Device type: disk Transport protocol: SAS Local Time is: Mon Aug 22 16:49:11 2016 AST SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK SS Media used endurance indicator: 0% Current Drive Temperature: 52 C Drive Trip Temperature: 75 C local]SAE-G168A-1# show hd smart hd16b smartctl 6.1 2013-03-16 r3800 [x86_64-linux-2.6.38-staros-v3-hw-64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: XXXX Product: Z16IZF2E-200UCU Revision: E4TC User Capacity: 200,049,647,616 bytes [200 GB] Logical block size: 512 bytes LU is resource provisioned, LBPRZ=1 Rotation Rate: Solid State Device Form Factor: 2.5 inches Logical Unit id: 0x5000a7203007932c Serial number: STM000171E8B Device type: disk Transport protocol: SAS Local Time is: Mon Aug 22 16:49:21 2016 AST SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK SS Media used endurance indicator: 0% Current Drive Temperature: 50 C Drive Trip Temperature: 75 C
由於智慧測試沒問題,您可以斷定其與高溫問題無關。
有時,單個FSC上的一個或兩個驅動器會移到無效分割槽或映像狀態。
如果在單個FSC的驅動中,超級電容器發生故障,則可能發生這種情況。無法恢復此故障,因為它是永久的驅動器故障,驅動器被更換或FSC。
如需進一步的協助,請聯絡Cisco TAC。
如果對任何驅動器都出現這種情況,表明該驅動器上存在超級電容器故障。
這些日誌顯示在活動管理輸入/輸出卡(MIO)的調試控制檯日誌中。
2016-Aug-21+07:16:34.038 card 5-cpu0: [974499.606697] sd 0:0:2:0: [sde] Unhandled sense code^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.611565] sd 0:0:2:0: [sde] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.618877] sd 0:0:2:0: [sde] Sense Key : Data Protect [current] ^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.625214] sd 0:0:2:0: [sde] Add. Sense: Write protected^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.630840] sd 0:0:2:0: [sde] CDB: Write(10): 2a 00 00 00 08 08 00 00 08 00^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.638173] end_request: I/O error, dev sde, sector 2056^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.643558] md: super_written gets error=-5, uptodate=0^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.648872] md/raid:md0: Disk failure on md16, disabling device.^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.648873] md/raid:md0: Operation continuing on 3 devices.^M 2016-Aug-21+07:16:34.238 card 5-cpu0: [974499.712047] sd 0:0:2:0: [sde] Sense Key : Recovered Error [current] ^M 2016-Aug-21+07:16:34.238 card 5-cpu0: [974499.718637] sd 0:0:2:0: [sde] <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0^M 2016-Aug-21+07:21:37.112 card 5-cpu0: [974802.605115] sd 0:0:2:0: [sde] Sense Key : Recovered Error [current] ^M 2016-Aug-21+07:21:37.112 card 5-cpu0: [974802.611712] sd 0:0:2:0: [sde] <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0^M 2016-Aug-21+07:26:37.099 card 5-cpu0: [975102.612464] sd 0:0:2:0: [sde] Sense Key : Recovered Error [current] ^M
請檢查特定的錯誤登入
這證實了超級電容器的故障。
**** debug console card <Active MIO card> cpu 0 tail 10000 only ***** [sde] <> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0 ******** debug hdctrl history ******* Thursday March 31 02:29:03 EDT 2016 Primary HDCTRL: 2016-Mar-31+02:13:38.632 move fsm=hd15#6 src=DISK_CHECK dst=DISK_FAILED arg=disk I/O error # hdctrl/hdctrl_fsm_disk.c : 4354 @ disk_fsm_enter()
Below is the mapping of the disks to device name:
******** debug hdctrl mapping ******* Local card (5): Disk Device Number SCSI Size ---------- ------ ------ ------- ------ hd14a sdb 8:16 0:0:0:0 186 GB hd15a sdd 8:48 0:0:1:0 186 GB hd16a sde 8:64 0:0:2:0 186 GB hd17a sdg 8:96 0:0:3:0 186 GB hd14b sda 8:0 1:0:0:0 186 GB hd15b sdc 8:32 1:0:1:0 186 GB hd16b sdf 8:80 1:0:2:0 186 GB hd17b sdh 8:112 1:0:3:0 186 GB
Below highlated logs indicating a failure of the super capacitor on the solid state drive of FSC card disk 16a.
hd16a <==>sde
-------------
2016-Aug-21+07:21:37.112 card 5-cpu0: [974802.611712] sd 0:0:2:0: [sde] <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0^M
2016-Aug-21+07:26:37.099 card 5-cpu0: [975102.612464] sd 0:0:2:0: [sde] Sense Key : Recovered Error [current] ^M