El conjunto de documentos para este producto aspira al uso de un lenguaje no discriminatorio. A los fines de esta documentación, "no discriminatorio" se refiere al lenguaje que no implica discriminación por motivos de edad, discapacidad, género, identidad de raza, identidad étnica, orientación sexual, nivel socioeconómico e interseccionalidad. Puede haber excepciones en la documentación debido al lenguaje que se encuentra ya en las interfaces de usuario del software del producto, el lenguaje utilizado en función de la documentación de la RFP o el lenguaje utilizado por un producto de terceros al que se hace referencia. Obtenga más información sobre cómo Cisco utiliza el lenguaje inclusivo.
Cisco ha traducido este documento combinando la traducción automática y los recursos humanos a fin de ofrecer a nuestros usuarios en todo el mundo contenido en su propio idioma. Tenga en cuenta que incluso la mejor traducción automática podría no ser tan precisa como la proporcionada por un traductor profesional. Cisco Systems, Inc. no asume ninguna responsabilidad por la precisión de estas traducciones y recomienda remitirse siempre al documento original escrito en inglés (insertar vínculo URL).
Este documento describe un problema específico de hardware en la tarjeta de almacenamiento y fabric (FSC) en los routers de servicios agregados (ASR) 5500 de Cisco.
******** Show alarm outstanding verbose ******* Severity Object Timestamp Alarm ID -------- ---------- ---------------------------------- --------------------- Alarm Details ------------------------------------------------------------------------------------------------------------------------------------ Minor Card 14 Sunday August 21 07:16:34 A 3610743104839221248 The Fabric & 2x200GB Storage Card in slot 14 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed. Minor Card 15 Sunday August 21 07:16:34 A 3610743104839221249 The Fabric & 2x200GB Storage Card in slot 15 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed. Minor Card 17 Sunday August 21 07:16:34 A 3610743104839221250 The Fabric & 2x200GB Storage Card in slot 17 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed.
******** show card table all ******* Slot Card Type Oper State SPOF Attach ----------- -------------------------------------- ------------- ---- ------ 1: DPC Data Processing Card Active No 2: DPC Data Processing Card Active No 3: DPC Data Processing Card Active No 4: DPC Data Processing Card Active No 5: MMIO Management & 20x10Gb I/O Card Active No 6: MMIO Management & 20x10Gb I/O Card Standby - 7: DPC Data Processing Card Active No 8: DPC Data Processing Card Active No 9: DPC Data Processing Card Active No 10: DPC Data Processing Card Standby - 11: SSC System Status Card Active No 12: SSC System Status Card Active No 13: FSC None - - 14: FSC Fabric & 2x200GB Storage Card Active Yes 15: FSC Fabric & 2x200GB Storage Card Active Yes 16: FSC Fabric & 2x200GB Storage Card Active No 17: FSC Fabric & 2x200GB Storage Card Active Yes
******** show hd raid verbose ******* HD RAID: State : Available (active) Degraded : Yes <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< UUID : 59a7ebf0:7f798af6:68869614:3210b2c6 Size : 1.2TB (1200000073728 bytes) Action : Idle Card 16 State : Faulty card <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Description : FSC16 SAD17160089 Size : 400GB (400096755712 bytes) Disk hd16a State : In-sync component Created : Fri May 23 23:05:53 2014 Updated : Fri May 23 23:05:53 2014 Events : 0 Model : STEC Z16IZF2E-200UCU E4TC Serial Number : STM000171E75 Size : 200GB (200049647616 bytes) Disk hd16b State : In-sync component Created : Fri May 23 23:05:53 2014 Updated : Fri May 23 23:05:53 2014 Events : 0 Model : STEC Z16IZF2E-200UCU E4TC Serial Number : STM000171E8B Size : 200GB (200049647616 bytes)
Estos errores se informan cuando inicia el troubleshooting con syslogs.
[local-60sec34.188] [hdctrl 132011 critical] [5/0/7135 <hdctrl:0> rl_fsm_mirror.c:5938] [software internal system critical-info syslog] hd16, FSC16 SAD17160089 failed from RAID 59a7ebf0:7f798af6:68869614:3210b2c6, MIO5 SAD1716021N. [local-60sec34.188] [hdctrl 132016 error] [5/0/7135 <hdctrl:0> rl_fsm_mirror.c:3399] [software internal system critical-info syslog] Error detected on hd16a (STEC Z16IZF2E-200UCU E4TC STM000171E75), FSC16 SAD17160089: ioerr_cnt increased from 25 to 27 [local-60sec34.221] [alarmctrl 65201 info] [5/0/7072 <evlogd:0> alarmctrl.c:192] [software internal system critical-info syslog] Alarm condition: id 321befb92b220000 (Minor): The Fabric & 2x200GB Storage Card in slot 14 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed. [local-60sec34.222] [alarmctrl 65201 info] [5/0/7072 <evlogd:0> alarmctrl.c:192] [software internal system critical-info syslog] Alarm condition: id 321befb92b220002 (Minor): The Fabric & 2x200GB Storage Card in slot 17 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed. [local-60sec34.222] [alarmctrl 65201 info] [5/0/7072 <evlogd:0> alarmctrl.c:192] [software internal system critical-info syslog] Alarm condition: id 321befb92b220001 (Minor): The Fabric & 2x200GB Storage Card in slot 15 is a single point of failure. Another Fabric & 2x200GB Storage Card of the same type is needed.
Si ve este registro de errores:
Aug 21 07:16:34 evlogd: [local-60sec34.188] [hdctrl 132016 error] [5/0/7135 <hdctrl:0> rl_fsm_mirror.c:3399] [software internal system critical-info syslog] Error detected on hd16a (STEC Z16IZF2E-200UCU E4TC STM000171E75), FSC16 SAD17160089: ioerr_cnt increased from 25 to 27
Siempre es recomendable ejecutar la prueba inteligente y comprobar si este problema no está relacionado con la temperatura alta.
[local]# show hd smart hd16a smartctl 6.1 2013-03-16 r3800 [x86_64-linux-2.6.38-staros-v3-hw-64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: STEC Product: Z16IZF2E-200UCU Revision: E4TC User Capacity: 200,049,647,616 bytes [200 GB] Logical block size: 512 bytes LU is resource provisioned, LBPRZ=1 Rotation Rate: Solid State Device Form Factor: 2.5 inches Logical Unit id: 0x5000a72030079304 Serial number: STM000171E75 Device type: disk Transport protocol: SAS Local Time is: Mon Aug 22 16:49:11 2016 AST SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK SS Media used endurance indicator: 0% Current Drive Temperature: 52 C Drive Trip Temperature: 75 C local]SAE-G168A-1# show hd smart hd16b smartctl 6.1 2013-03-16 r3800 [x86_64-linux-2.6.38-staros-v3-hw-64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: XXXX Product: Z16IZF2E-200UCU Revision: E4TC User Capacity: 200,049,647,616 bytes [200 GB] Logical block size: 512 bytes LU is resource provisioned, LBPRZ=1 Rotation Rate: Solid State Device Form Factor: 2.5 inches Logical Unit id: 0x5000a7203007932c Serial number: STM000171E8B Device type: disk Transport protocol: SAS Local Time is: Mon Aug 22 16:49:21 2016 AST SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK SS Media used endurance indicator: 0% Current Drive Temperature: 50 C Drive Trip Temperature: 75 C
Dado que la prueba inteligente está bien, puede concluir que no está relacionada con problemas de alta temperatura.
A veces, una o ambas unidades de una sola FSC pasan al estado Partición o imagen no válida.
Esto podría ocurrir si el Super Capacitor falla, que está en la unidad de un solo FSC. Esta falla no se puede recuperar porque es una falla permanente de la unidad donde se reemplaza la unidad o el FSC.
Póngase en contacto con el TAC de Cisco para obtener más ayuda.
Si esto se ve para cualquier unidad, hay una falla supercondensadora en esa unidad.
Estos registros se ven en los registros de la consola de depuración de la tarjeta de entrada/salida de administración (MIO) activa.
2016-Aug-21+07:16:34.038 card 5-cpu0: [974499.606697] sd 0:0:2:0: [sde] Unhandled sense code^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.611565] sd 0:0:2:0: [sde] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.618877] sd 0:0:2:0: [sde] Sense Key : Data Protect [current] ^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.625214] sd 0:0:2:0: [sde] Add. Sense: Write protected^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.630840] sd 0:0:2:0: [sde] CDB: Write(10): 2a 00 00 00 08 08 00 00 08 00^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.638173] end_request: I/O error, dev sde, sector 2056^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.643558] md: super_written gets error=-5, uptodate=0^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.648872] md/raid:md0: Disk failure on md16, disabling device.^M 2016-Aug-21+07:16:34.139 card 5-cpu0: [974499.648873] md/raid:md0: Operation continuing on 3 devices.^M 2016-Aug-21+07:16:34.238 card 5-cpu0: [974499.712047] sd 0:0:2:0: [sde] Sense Key : Recovered Error [current] ^M 2016-Aug-21+07:16:34.238 card 5-cpu0: [974499.718637] sd 0:0:2:0: [sde] <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0^M 2016-Aug-21+07:21:37.112 card 5-cpu0: [974802.605115] sd 0:0:2:0: [sde] Sense Key : Recovered Error [current] ^M 2016-Aug-21+07:21:37.112 card 5-cpu0: [974802.611712] sd 0:0:2:0: [sde] <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0^M 2016-Aug-21+07:26:37.099 card 5-cpu0: [975102.612464] sd 0:0:2:0: [sde] Sense Key : Recovered Error [current] ^M
Verifique los registros de errores específicos
Y esto confirma la falla del supercondensador.
**** debug console card <Active MIO card> cpu 0 tail 10000 only ***** [sde] <> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0 ******** debug hdctrl history ******* Thursday March 31 02:29:03 EDT 2016 Primary HDCTRL: 2016-Mar-31+02:13:38.632 move fsm=hd15#6 src=DISK_CHECK dst=DISK_FAILED arg=disk I/O error # hdctrl/hdctrl_fsm_disk.c : 4354 @ disk_fsm_enter()
Below is the mapping of the disks to device name:
******** debug hdctrl mapping ******* Local card (5): Disk Device Number SCSI Size ---------- ------ ------ ------- ------ hd14a sdb 8:16 0:0:0:0 186 GB hd15a sdd 8:48 0:0:1:0 186 GB hd16a sde 8:64 0:0:2:0 186 GB hd17a sdg 8:96 0:0:3:0 186 GB hd14b sda 8:0 1:0:0:0 186 GB hd15b sdc 8:32 1:0:1:0 186 GB hd16b sdf 8:80 1:0:2:0 186 GB hd17b sdh 8:112 1:0:3:0 186 GB
Below highlated logs indicating a failure of the super capacitor on the solid state drive of FSC card disk 16a.
hd16a <==>sde
-------------
2016-Aug-21+07:21:37.112 card 5-cpu0: [974802.611712] sd 0:0:2:0: [sde] <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0^M
2016-Aug-21+07:26:37.099 card 5-cpu0: [975102.612464] sd 0:0:2:0: [sde] Sense Key : Recovered Error [current] ^M