此产品的文档集力求使用非歧视性语言。在本文档集中,非歧视性语言是指不隐含针对年龄、残障、性别、种族身份、族群身份、性取向、社会经济地位和交叉性的歧视的语言。由于产品软件的用户界面中使用的硬编码语言、基于 RFP 文档使用的语言或引用的第三方产品使用的语言,文档中可能无法确保完全使用非歧视性语言。 深入了解思科如何使用包容性语言。
思科采用人工翻译与机器翻译相结合的方式将此文档翻译成不同语言,希望全球的用户都能通过各自的语言得到支持性的内容。 请注意:即使是最好的机器翻译,其准确度也不及专业翻译人员的水平。 Cisco Systems, Inc. 对于翻译的准确性不承担任何责任,并建议您总是参考英文原始文档(已提供链接)。
本文档介绍思科集成多业务路由器(ASR)5500上交换矩阵和存储卡(FSC)上的特定硬件问题。
******** Show alarm outstanding verbose ******* Severity Object Timestamp Alarm ID -------- ---------- ---------------------------------- --------------------- Alarm Details ------------------------------------------------------------------------------------------------------------------------------------ Minor Card 14 Sunday August 21 07:16:34 A 3610743104839221248 The Fabric & 2x200GB Storage Card in slot 14 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed. Minor Card 15 Sunday August 21 07:16:34 A 3610743104839221249 The Fabric & 2x200GB Storage Card in slot 15 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed. Minor Card 17 Sunday August 21 07:16:34 A 3610743104839221250 The Fabric & 2x200GB Storage Card in slot 17 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed.
******** show card table all ******* Slot Card Type Oper State SPOF Attach ----------- -------------------------------------- ------------- ---- ------ 1: DPC Data Processing Card Active No 2: DPC Data Processing Card Active No 3: DPC Data Processing Card Active No 4: DPC Data Processing Card Active No 5: MMIO Management & 20x10Gb I/O Card Active No 6: MMIO Management & 20x10Gb I/O Card Standby - 7: DPC Data Processing Card Active No 8: DPC Data Processing Card Active No 9: DPC Data Processing Card Active No 10: DPC Data Processing Card Standby - 11: SSC System Status Card Active No 12: SSC System Status Card Active No 13: FSC None - - 14: FSC Fabric & 2x200GB Storage Card Active Yes 15: FSC Fabric & 2x200GB Storage Card Active Yes 16: FSC Fabric & 2x200GB Storage Card Active No 17: FSC Fabric & 2x200GB Storage Card Active Yes
******** show hd raid verbose ******* HD RAID: State : Available (active) Degraded : Yes <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< UUID : 59a7ebf0:7f798af6:68869614:3210b2c6 Size : 1.2TB (1200000073728 bytes) Action : Idle Card 16 State : Faulty card <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Description : FSC16 SAD17160089 Size : 400GB (400096755712 bytes) Disk hd16a State : In-sync component Created : Fri May 23 23:05:53 2014 Updated : Fri May 23 23:05:53 2014 Events : 0 Model : STEC Z16IZF2E-200UCU E4TC Serial Number : STM000171E75 Size : 200GB (200049647616 bytes) Disk hd16b State : In-sync component Created : Fri May 23 23:05:53 2014 Updated : Fri May 23 23:05:53 2014 Events : 0 Model : STEC Z16IZF2E-200UCU E4TC Serial Number : STM000171E8B Size : 200GB (200049647616 bytes)
使用syslog启动故障排除时,会报告这些错误。
[local-60sec34.188] [hdctrl 132011 critical] [5/0/7135 <hdctrl:0> rl_fsm_mirror.c:5938] [software internal system critical-info syslog] hd16, FSC16 SAD17160089 failed from RAID 59a7ebf0:7f798af6:68869614:3210b2c6, MIO5 SAD1716021N. [local-60sec34.188] [hdctrl 132016 error] [5/0/7135 <hdctrl:0> rl_fsm_mirror.c:3399] [software internal system critical-info syslog] Error detected on hd16a (STEC Z16IZF2E-200UCU E4TC STM000171E75), FSC16 SAD17160089: ioerr_cnt increased from 25 to 27 [local-60sec34.221] [alarmctrl 65201 info] [5/0/7072 <evlogd:0> alarmctrl.c:192] [software internal system critical-info syslog] Alarm condition: id 321befb92b220000 (Minor): The Fabric & 2x200GB Storage Card in slot 14 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed. [local-60sec34.222] [alarmctrl 65201 info] [5/0/7072 <evlogd:0> alarmctrl.c:192] [software internal system critical-info syslog] Alarm condition: id 321befb92b220002 (Minor): The Fabric & 2x200GB Storage Card in slot 17 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed. [local-60sec34.222] [alarmctrl 65201 info] [5/0/7072 <evlogd:0> alarmctrl.c:192] [software internal system critical-info syslog] Alarm condition: id 321befb92b220001 (Minor): The Fabric & 2x200GB Storage Card in slot 15 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed.
如果您看到以下错误日志:
Aug 21 07:16:34 evlogd: [local-60sec34.188] [hdctrl 132016 error] [5/0/7135 <hdctrl:0> rl_fsm_mirror.c:3399] [software internal system critical-info syslog] Error detected on hd16a (STEC Z16IZF2E-200UCU E4TC STM000171E75), FSC16 SAD17160089: ioerr_cnt increased from 25 to 27
始终建议运行智能测试并检查此问题是否与高温无关。
[local]# show hd smart hd16a smartctl 6.1 2013-03-16 r3800 [x86_64-linux-2.6.38-staros-v3-hw-64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: STEC Product: Z16IZF2E-200UCU Revision: E4TC User Capacity: 200,049,647,616 bytes [200 GB] Logical block size: 512 bytes LU is resource provisioned, LBPRZ=1 Rotation Rate: Solid State Device Form Factor: 2.5 inches Logical Unit id: 0x5000a72030079304 Serial number: STM000171E75 Device type: disk Transport protocol: SAS Local Time is: Mon Aug 22 16:49:11 2016 AST SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK SS Media used endurance indicator: 0% Current Drive Temperature: 52 C Drive Trip Temperature: 75 C local]SAE-G168A-1# show hd smart hd16b smartctl 6.1 2013-03-16 r3800 [x86_64-linux-2.6.38-staros-v3-hw-64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: XXXX Product: Z16IZF2E-200UCU Revision: E4TC User Capacity: 200,049,647,616 bytes [200 GB] Logical block size: 512 bytes LU is resource provisioned, LBPRZ=1 Rotation Rate: Solid State Device Form Factor: 2.5 inches Logical Unit id: 0x5000a7203007932c Serial number: STM000171E8B Device type: disk Transport protocol: SAS Local Time is: Mon Aug 22 16:49:21 2016 AST SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK SS Media used endurance indicator: 0% Current Drive Temperature: 50 C Drive Trip Temperature: 75 C
由于智能测试正常,您可以断定它与高温问题无关。
有时,单个FSC上的一个或两个驱动器都会变为无效分区或映像状态。
如果超级电容器在单个FSC的驱动器中发生故障,则可能会发生这种情况。此故障无法恢复,因为它是永久驱动器故障,其中更换了驱动器或FSC。
请联系思科TAC获取进一步帮助。
如果发现任何驱动器出现此情况,则该驱动器上出现超级电容器故障。
这些日志在活动管理输入/输出卡(MIO)的调试控制台日志中可见。
2016-Aug-21+07:16:34.038 card 5-cpu0: [974499.606697] sd 0:0:2:0: [sde] Unhandled sense code^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.611565] sd 0:0:2:0: [sde] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.618877] sd 0:0:2:0: [sde] Sense Key : Data Protect [current] ^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.625214] sd 0:0:2:0: [sde] Add. Sense: Write protected^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.630840] sd 0:0:2:0: [sde] CDB: Write(10): 2a 00 00 00 08 08 00 00 08 00^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.638173] end_request: I/O error, dev sde, sector 2056^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.643558] md: super_written gets error=-5, uptodate=0^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.648872] md/raid:md0: Disk failure on md16, disabling device.^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.648873] md/raid:md0: Operation continuing on 3 devices.^M 2016-Aug-21+07:16:34.238 card 5-cpu0: [974499.712047] sd 0:0:2:0: [sde] Sense Key : Recovered Error [current] ^M 2016-Aug-21+07:16:34.238 card 5-cpu0: [974499.718637] sd 0:0:2:0: [sde] <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0^M 2016-Aug-21+07:21:37.112 card 5-cpu0: [974802.605115] sd 0:0:2:0: [sde] Sense Key : Recovered Error [current] ^M 2016-Aug-21+07:21:37.112 card 5-cpu0: [974802.611712] sd 0:0:2:0: [sde] <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0^M 2016-Aug-21+07:26:37.099 card 5-cpu0: [975102.612464] sd 0:0:2:0: [sde] Sense Key : Recovered Error [current] ^M
请检查特定错误登录
这证实了超级电容器的故障。
**** debug console card <Active MIO card> cpu 0 tail 10000 only ***** [sde] <> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0 ******** debug hdctrl history ******* Thursday March 31 02:29:03 EDT 2016 Primary HDCTRL: 2016-Mar-31+02:13:38.632 move fsm=hd15#6 src=DISK_CHECK dst=DISK_FAILED arg=disk I/O error # hdctrl/hdctrl_fsm_disk.c : 4354 @ disk_fsm_enter()
Below is the mapping of the disks to device name:
******** debug hdctrl mapping ******* Local card (5): Disk Device Number SCSI Size ---------- ------ ------ ------- ------ hd14a sdb 8:16 0:0:0:0 186 GB hd15a sdd 8:48 0:0:1:0 186 GB hd16a sde 8:64 0:0:2:0 186 GB hd17a sdg 8:96 0:0:3:0 186 GB hd14b sda 8:0 1:0:0:0 186 GB hd15b sdc 8:32 1:0:1:0 186 GB hd16b sdf 8:80 1:0:2:0 186 GB hd17b sdh 8:112 1:0:3:0 186 GB
Below highlated logs indicating a failure of the super capacitor on the solid state drive of FSC card disk 16a.
hd16a <==>sde
-------------
2016-Aug-21+07:21:37.112 card 5-cpu0: [974802.611712] sd 0:0:2:0: [sde] <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0^M
2016-Aug-21+07:26:37.099 card 5-cpu0: [975102.612464] sd 0:0:2:0: [sde] Sense Key : Recovered Error [current] ^M