本文档说明在遇到各种奇偶校验错误消息后排除故障和隔离Cisco 12000系列互联网路由器故障部件或组件的步骤。
注意:本文档不介绍奇偶校验错误的原因。如果您对奇偶校验错误(也称为单事件更新 — SEU)及其可能原因的更简明定义感兴趣,我们建议您阅读“提高网络可用性”中链接的文档。
有关文档规则的详细信息,请参阅 Cisco 技术提示规则。
在继续阅读本文档之前,我们建议您阅读以下文档:
本文档中的信息基于以下软件和硬件版本。
Cisco 12000 系列互联网路由器
所有版本的 Cisco IOS® 软件
本文档中的信息都是基于特定实验室环境中的设备创建的。本文档中使用的所有设备最初均采用原始(默认)配置。如果您是在真实网络上操作,请确保您在使用任何命令前已经了解其潜在影响。
大多数Cisco 12000系列互联网路由器路由处理器和线卡都包含错误代码纠正(ECC)功能。但是,该字段中有一些现有线卡不具备ECC功能。ECC功能仅覆盖卡上的RAM或同步动态RAM(SDRAM)内存。其余不受ECC保护。
以下是与Cisco 12000配合使用的线卡的ECC功能比较:
所有引擎2及更高版本的卡都具有ECC功能。
FCS后,引擎1卡更改为ECC。
引擎0卡没有ECC功能。
某些卡可升级到集成ECC功能的类似产品。
下表列出了具有ECC功能的产品:
非ECC产品 | ECC产品 |
---|---|
GRP(=) | GRP-B(=) |
GE-SX/LH-SC(=) | GE-GBIC-SC-B(=) |
GE-GBIC-SC-A(=) | GE-GBIC-SC-B(=) |
8FE-FX-SC(=) | 8FE-FX-SC-B(=) |
8FE-TX-RF45(=) | 8FE-TX-RJ45-B(=) |
6DS3-SMB(=) | 6DS3-SMB-B(=) |
12DS3-SBM(=) | 12DS3-SMB-B(=) |
OC12/SRP-IR-SC(=) | OC12/SRP-IR-SC-B(=) |
OC12/SRP-MM-SC(=) | OC12/SRP-mm-SC-B(=) |
OC12/SRP-LR-SC(=) | OC12/SRP-LR-SC-B(=) |
注意:-B和ECC是独立的。-B表示产品是主板的第二个可订购的主要修订版。在某些情况下,这是ECC的修订版。
思科提供技术迁移计划(TMP),允许您将非ECC板升级到新的ECC板。购买新企业发展委员会董事会时,将以非企业发展委员会董事会为交换条件。
下面的流程图可帮助您确定Cisco 12000系列互联网路由器的哪个组件负责千兆路由处理器(GRP)上的奇偶校验/错误代码纠正(ECC)错误消息。
注意:在奇偶校验/ECC错误事件期间,捕获并记录show tech-support输出和控制台日志,并收集所有crashinfo文件。
下面的流程图可帮助您确定Cisco 12000系列互联网路由器线卡的哪个组件负责奇偶校验/错误代码纠正(ECC)错误消息:
注意:每当线卡发生奇偶校验/ECC错误事件时,请收集尽可能多的信息(有关详细信息,请参阅Cisco 12000系列Internet路由器上的线卡崩溃故障排除)。
Cisco 12000系列互联网路由器在不崩溃的情况下从其他线卡存储器(SDRAM和SRAM)的奇偶校验错误中恢复。
奇偶校验错误的数据可由多个奇偶校验设备报告,以用于Cisco 12000系列互联网路由器上的任何读或写操作。
GRP-B和PRP使用单位纠错和多位错误检测ECC到共享内存(SDRAM)。SDRAM中的单位错误会自动纠正,系统继续正常运行。
PRP和GRP-B具有支持ECC的增强型动态RAM(DRAM)控制器。因此,它们可以纠正单位错误并报告多位错误。纠正单位错误的方式如下:
%Tiger-3-SBE: Single bit error detected and corrected at <address>
SBE由纠错电路进行纠正,不影响GRP-B或PRP的功能。除非发生频繁,否则单位错误不需要执行任何操作。在这种情况下,建议更换处理器板。
多位错误检测通过总线错误异常或CPU缓存奇偶校验错误异常报告。
如果CPU在通过SysAD总线或CPU内部缓存存储器(L1或L2)访问处理器的外部缓存(GRP上的L3)时检测到奇偶校验错误,则报告处理器内存奇偶校验错误消息。 表1列出了每种缓存奇偶校验错误将打印出的消息示例:
表 1:缓存奇偶校验错误位置
奇偶校验错误的位置 | 错误消息 |
---|---|
L1指令缓存 | Error:主缓存、instr缓存、字段:数据 |
L1数据缓存 | Error:主、数据缓存、字段:数据 |
L2指令缓存 | Error:SysAD、instr缓存、字段:数据 |
L2数据缓存 | Error:SysAD、数据缓存、字段:数据 |
L3指令缓存 | Error:SysAD、instr缓存、字段:第1个字 |
L3数据缓存 | Error:SysAD、数据缓存、字段:第1个字 |
示例:
错误消息的第一行指示奇偶校验错误的位置,可以是表1中列出的任何位置。在本例中,该位置为L3指令缓存。
Error: SysAD, instr cache, fields: data, 1st dword Physical addr(21:3) 0x000000, virtual addr 0x6040BF60, vAddr(14:12) 0x3000 virtual address corresponds to main:text, cache word 0 Low Data High Data Par Low Data High Data Par L1 Data: 0:0xAE620068 0x8C830000 0x00 1:0x50400001 0xAC600004 0x01 2:0xAC800000 0x00000000 0x02 3:0x1600000B 0x00000000 0x01 Low Data High Data Par Low Data High Data Par DRAM Data: 0:0xAE620068 0x8C830000 0x00 1:0x50400001 0xAC600004 0x01 2:0xAC800000 0x00000000 0x02 3:0x1600000B 0x00000000 0x01
show version的输出应类似于以下内容:
...System was restarted by processor memory parity error at PC 0x602310D0, address 0x0 at 03:18:21 GMT Sun Oct 27 2002 ...
从show context输出中,您可以看到系统由缓存奇偶校验异常重新启动:
Router#show context slot 11 CRASH INFO: Slot 11, Index 1, Crash at 19:08:07 CST Thu Nov 14 2002 VERSION: GS Software (GSR-P-M), Version 12.0(22)S1, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1) TAC Support: http://www.cisco.com/tac Compiled Mon 16-Sep-02 17:36 by nmasa Card Type: Route Processor, S/N LC uptime was 0 minutes. System exception: sig=20, code=0xE42F3E4B, context=0x52CF3D44 System restarted by a Cache Parity Exception STACK TRACE: -Traceback= 5020453C 500E5E24 5010E6DC 5015F89C 501E9F6C 501E9F58 ...
在发生第二次故障后更换GRP或PRP。
控制台输出中可能显示以下消息:
SEC 7: %GRP-3-PARITYERR: Parity error detected in the fabric buffers. Data (8)
此消息表示GRP上的交换矩阵接口硬件检测到奇偶校验错误。十六进制数表示错误中断矢量。这通常表示GRP上报告错误的硬件问题(本例中为插槽7)。 如果出现类似问题,应更换故障GRP。
当路由器收到奇偶校验错误的数据时,会显示此错误消息。
奇偶校验错误的数据由多个奇偶校验设备报告,用于检查Cisco 12000系列互联网路由器上执行的任何读或写操作。
PRP使用单位纠错和多位错误检测ECC来共享内存(SDRAM)。SDRAM中的单位错误会自动纠正,系统继续正常运行。
单位错误(SBE)由纠错电路(ECC)纠正,不影响PRP的功能。除非发生频繁,否则单位错误不需要执行任何操作。
如果错误频繁发生,建议更换处理器板。
SDRAM单位纠错码(ECC)错误
单位错误是从内存读取的单词中不正确的单位数据。对于SBE,可以在不中断操作的情况下纠正错误。
检测单位错误,并提供校正数据。例如,在引擎4/4+上报告单位错误如下:
SLOT 6:Jul 19 07:37:34: %TX192-3-SDRAM_SBE: Error=0x2 - DIMM1 Syndrome=0x7600 Addr=0xBEA09 Data bit80-Traceback= 401C8C9C 401C9508 401CDE08 401CDE40 4007F674 4009ED0C 4009ECF8
SBE由纠错电路进行纠正,不影响线卡的功能。除非发生频繁,否则单位错误不需要执行任何操作。在这种情况下,建议更换线卡。
SDRAM多位ECC错误
多位错误是指同一字中多位错误。对于MBE,检测到错误,线卡崩溃。SBE和MBE的出现非常罕见。
以下是响应SDRAM中的多位ECC错误而打印到控制台的消息示例:
SLOT 5:Jul 25 16:58:51: %MCC192-3-SDRAM_SBE: Error=0x808 - DIMM0 Syndrome=0x31000000 Addr=0x81034 Data bit120 -Traceback= 401C8C9C 401C9508 40450018 400BF7D4 SLOT 5:Jul 25 16:58:51: %MCC192-3-SDRAM_MBE: Error=0x808 - DIMM0 Syndrome=0x18000000 Addr=0x80834 -Traceback= 401C8D88 401C9508 40450018 400BF7D4
MBE无法通过ECC纠正,并导致线卡崩溃。然后,线路卡将被重新加载并由路由处理器恢复正常运行。
现场诊断可用于检查MBE的线卡内存。现场诊断将MBE检测为内存错误。以下是未通过现场诊断的TX SDRAM上出现多位错误的主板示例:
FDIAG_STAT_IN_PROGRESS(5): test #12 TX SDRAM Marching Pattern FD 5> RIM: FD 5> TX Registers FD 5> INT_CAUSE_REG = 0x00000680 FD 5> Unexpected L3FE Interrupt occured. FD 5> ERROR: TX BMA Asic Interrupt Occured FD 5> *** 0-INT: External Interrupt *** FDIAG_STAT_DONE_FAIL(5) test_num 12, error_code 1 Field Diagnostic: ****TEST FAILURE**** slot 5: last test run 12, TX SDRAM Marching Pattern, error 1 Field Diag eeprom values: run 5 fail mode 1 (TEST FAILURE) slot 5 last test failed was 12, error code 1
如果您有QOC48或OC192线卡,请参阅以下现场通知:QOC48/OC192 SBE/MBE。否则,您应在出现第二次故障后更换线卡。
检查 show context slot[slot#] 输出中sig= 字段的值:
Router#show context slot 4 CRASH INFO: Slot 4, Index 1, Crash at 04:28:56 EDT Tue Apr 20 1999 VERSION: GS Software (GLC1-LC-M), Version 11.2(15)GS1a, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1) Compiled Mon 28-Dec-98 14:53 by tamb Card Type: 1 Port Packet Over SONET OC-12c/STM-4c, S/N CAB020500AL System exception: SIG=20, code=0xA414EF5A, context=0x40337424 System restarted by a Cache Parity Exception
当在非常特定的电压和温度条件下运行时,基于引擎1转发引擎的某些卡容易出现内部缓存损坏问题。
缓存错误恢复功能(CERF)是Engine1线卡中的软件功能,通过从外部CPU缓存刷新错误和从DRAM刷新缓存行来检测并纠正缓存奇偶校验错误。此功能在CPU缓存管理算法中提供智能,使CPU能够从缓存内存奇偶校验错误中恢复,防止线卡崩溃,而不会造成性能损失。
注意:CERF默认为打开。此软件纠错码(ECC)的活动可通过show controller cerf命令监控。要关闭该功能,请使用全局配置命令no service cerf。
请参阅现场通知:GSR 1GE卡上的缓存奇偶校验错误(有关其他信息)。
要确定线卡所基于的转发引擎,请参阅如何确定机箱中运行的引擎卡?从Cisco 12000系列Internet路由器:常见问题文档。
如果线卡基于引擎1,则解决方法是将Cisco IOS软件升级到包含缓存错误恢复功能(CERF)的版本。 此功能最初在Cisco IOS软件版本12.0(21)S3中提供。如果它仍因缓存奇偶校验异常而崩溃,则需要更换线卡。
如果线卡基于另一种引擎类型,您应在发生类似崩溃的第二次时更换线卡。
控制台日志中可能会显示以下消息:
SLOT 2:Oct 23 17:07:45.531 EST: %LC-3-L3FEERRS: L3FE DRAM error 12 address 41E9B9A0 SLOT 2:Oct 23 17:07:45.531 EST: %LC-3-L3FEERR: L3FE error: rxbma 0 addr 0 txbma 0 addr 0 dram 12 addr 41E9B9A0 io 0 addr 0 SLOT 2:Oct 23 17:07:45.531 EST: %GSR-3-INTPROC: Process Traceback= 40080BAC -Traceback= 40357084 40495D30 40496EE0 400CCF98
此消息报告CPU DRAM写奇偶校验错误。L3FE代表第3层转发引擎。在出现类似问题时,应更换线卡。
以下是您可能会遇到的一些错误消息:
在单端口千兆线卡的日志中:
SLOT 5: %LCGE-3-INTR: TX GigaTranslator external interface parity error
对于较新的主板,一个解决方法是用现场可编程门阵列(FPGA)取代TX GigaTranslator ASIC。 在发生类似问题时,应更换主板。
在控制台输出中:
SLOT 6: %LC-3-ECC: Salsa ECC: About to handle ECC single bit error, ECC status = 2 DRAM error status = = 21 SLOT 6: %LC-3-L3FEERR: L3FE error: rxbma 0 addr 0 txbma 0 addr 0 dram 21 addr 200020 io 0 addr 0 SLOT 6: %LC-3-ECC: Salsa ECC: Addresses: Salsa returned =429BFDE8 correcting on = 429BFDE8 SLOT 6: %MEM_ECC-3-SBE: Single bit error detected and corrected at 0x429BFDE8 SLOT 6: %MEM_ECC-3-SYNDROME_SBE: 8-bit Syndrome for the detected Single-bit error: 0x8A SLOT 4: %MEM_ECC-3-SBE_HARD: Single bit *hard* error detected at 0x6299FB60 SLOT 1:Jun 10 05:29:47.690 EDT: %LC-3-ECC: Salsa ECC: About to handle ECC single bit error,ECC status = 0 DRAM error status =12 SLOT 6:Sep 26 15:18:01: %LC-3-SWECC: L2 event cleared: EPC = 0x40631CCC, CERR = 0xE40BB933, SysAD Addr = 1, total = 1 SLOT 0:Dec 7 13:48:11.480: %LC-3-SWECC_DATA: L2 event cleared: EPC = 0x400A8040, CERR = 0xA01DCE58, l1v = 0x41E3C20441E3C1C5, dv =0x41E3C1C441E3C204, SysAD Addr = 0, total = 1
这些消息可拆分为以下部分:
%LC-3-ECC:Salsa ECC — 线卡的L3FE ASIC中出错。
%LC-3-L3FEERR — 线卡的L3FE ASIC注册中出错。 信息 .
%MEM_ECC-3-SBE — 在从DRAM读取时检测到单位可纠正错误。show memory ecc命令可用于转储到目前为止记录的单位错误。这与%MEM_ECC-3-SBE_LIMIT错误消息相同。
%MEM_ECC-3-SDROME_SBE — 检测到的单位错误的8位综合符。此值不表示误差位的确切位置,但可用于近似其位置。这与%MEM_ECC-3-SCONDROME_SBE_LIMIT错误消息相同。
基本上,线卡报告了单位错误并自动纠正。除非经常发生,否则您无需执行任何操作。在这种情况下,建议更换线卡。
%LC-3-SWECC_DATA — 表示已通过软件错误更正代码(SWECC)在插槽0中的LC处更正缓存事件。
您可能会遇到的另一条消息是:
SLOT 4: %MEM_ECC-3-SBE_HARD: Single bit *hard* error detected at 0x6299FB60
此消息表示在从DRAM读取的CPU上检测到单位不可纠正错误[硬错误]。show memory ecc命令转储目前为止记录的单位错误,并指示检测到的硬错误地址位置。
使用show memory ecc命令监控系统,如果出现太多错误,请更换DRAM。
在控制台输出中,您可能会看到以下错误:
SLOT 6: %LC-6-PSAECC: An TLU SDRAM ECC correctable error occurred address 19C49FD SLOT 2:035610: Feb 26 13:09:13.628 UTC: %LC-6-PSAECC: An PLU SDRAM ECC correctable error occurred address 1956059
这意味着分组交换ASIC(PSA)ECC保护的SDRAM已识别出可纠正的一位错误。除非这些消息经常发生,否则您无需执行任何操作。在这种情况下,建议更换线卡。
在控制台输出中可以看到以下错误:
SLOT 6:00:03:53: %PM622-3-SAR_SRAM_PARITY_ERR: (6/0): Parity error in Reassembly SAR SRAM address: 80000000.Resetting the port SLOT 3:00:00:53: %PM622-3- SAR_MULTIBIT_ECC_ERR: (3/0): Multi-bit ECC Uncorrectable error in SAR SDRAM address: 80000000. Resseting the port. SLOT 4:00:00:53: %PM622-3 SAR_SINGLE_BIT_ECC_ERR: (3/0): ECC corrected an error in SAR SDRAM address: 800000. SLOT 0:Jun 25 20:45:53 KST: %EE48-6-ALPHAECC: RX ALPHA: An PLU SDRAM ECC correctable error occured address 1000C254 SLOT 0:Jun 25 20:45:53 KST: %EE48-6-ALPHAECC2: RX ALPHA: An PLU SDRAM ECC multibit error occured at address 1000E254 SLOT 5:Nov 17 09:46:30.171: %EE48-6-ALPHA_PARITY: TX ALPHA: Transient SRAM64 parity corrected error 3E Data 0 100000 Parity bits 0 SLOT 10:Feb 21 16:55:36: %EE48-3-ALPHA_SRAM64_ERR: TX ALPHA: ALPHA_PST_RANGE_ERR error 11003F Data 0 0 Parity bits 0 SLOT 4:Jan 15 06:30:00.942 UTC: %EE48-2-GULF_TX_SRAM_ERROR: ASIC GULF: TX SRAM uncorrectable error detected. Details=0x0000 SLOT 0:Mar 16 19:50:22.464 cst: %EE48-4-QM_ZBT_PARITY: ToFab Address 0xB95E Data 0x1 SLOT 5:May 17 06:17:35.507: %EE48-4-QM_NON_ZBT_PARITY: ToFab Error 0x10000028 SLOT 5:May 17 06:17:53.883: %EE48-4-QM_ZBT_PARITY_TRANSIENT: FrFab Address 0x0 Data 0x7E SLOT 5:May 17 06:17:53.883: %EE48-4- GULF_RX_TB_PARITY_ERROR: ASIC GULF: RX telecom bus parity error on port 0 SLOT 1:Dec 13 00:27:42: %EE48-3-SRAM_PARITY: SRAM parity: Unable to find shadow 281B9EB4 SLOT 0:Aug 4 08:55:37: %EE48-3-QM_PARITY: FrFab Address 0x1859E Data 0x10 SLOT 0:Aug 4 08:55:37: %EE48-3-QM_ERROR: FrFab error register 0x80000.
在基于引擎4/4+的线卡上,您可能会遇到以下消息:
SLOT 4: %RX192-3-HINTR: status = 0x4000000, mask = 0x3FFFFFFF - Parity error on rx_pbc_mem. -Traceback= 401C37C0 403D8814 400BE1EC SLOT 4: %LC-3-ERR_INTR: Error interrupt occurred -Traceback= 400CE028 400C8DF0 40010A24
或
SLOT 3: %RX192-3-HINTR: status = 0x4000000, mask = 0x3FFFFFFF - Parity error on rx_pbc_mem. -Traceback= 406012E0 406972A0 400C555C %FIB-3-FIBDISABLE: Fatal error, slot 3: IPC failure
或
SLOT 13:Dec 5 07:30:15.272 cst: %HERA-6-PAM_ACL_SBE: PKT CNT MEM Syndrome=0x8 Addr=0x523C SLOT 2:00:03:41: %MCC192-6-RED_PARAM1_SBE: Parameter 1 - Single Bit Error detected and corrected Syndrome = 0x7, Address = 0x43, samebit No, diffbit No SLOT 2:00:03:41: %MCC192-6-RED_PARAM2_SBE: Parameter 1 - Single Bit Error detected and corrected Syndrome = 0x7, Address = 0x43, samebit No, diffbit No SLOT 5:Apr 26 11:56:08.160: %MCC192-3-SDRAM_MBE: Error=0x200 - DIMM1 Syndrome=0x3000 Addr=0x811C3 SLOT 10:Mar 6 05:05:26.965: %RX192-3-ADJ_MEM_MBE: phy addr 0x7905E648, offset 0xBCC9, old ecc 0x0, new ecc 0x0, bit -1, value 0x0 - MBE on Adjacency Memory.. SLOT 13:Dec 5 07:30:15.272 cst: %HERA-6-PAM_ACL_MBE: PKT CNT MEM Syndrome=0x8 Addr=0x523C SLOT 2:00:03:41: %MCC192-6-RED_PARAM1_MBE: Parameter 1 - Single Bit Error detected and corrected Syndrome = 0x7, Address = 0x43, samebit No, diffbit No SLOT 2:00:03:41: %MCC192-3-RED: Error=0x80000 - RED PARAM 1 ECC SBE Error. -Traceback= 405AF5E0 405B1CEC 406DFF7C 406E057C 400FC7E SLOT 2:00:03:41: %MCC192-6-RED_PARAM2_MBE: Parameter 1 - Single Bit Error detected and corrected Syndrome = 0x7, Address = 0x43, samebit No, diffbit No Sep 8 14:32:09 jst: %MEM_ECC-3-SYNDROME_SBE_LIMIT: 8-bit Syndrome for the detected Single-bit error: 0xD5
此问题的症状包括:
此线卡上的思科快速转发已禁用
相关端口保持打开/打开状态
线卡可能会自动重置
如果线卡未重置,解决方法是执行microcode reload <slot>命令:
此消息并不总是表示RX192模块存在硬件问题。某些Cisco IOS软件错误可能会产生此错误消息,作为副作用。如果此消息仅显示一次,请继续监控主板。设备将重置。如果问题仍然存在,卡将自动重置。如果此消息仍然存在,请联系您的思科技术支持代表寻求帮助。
可以使用show controllers mcc192 ecc命令在E4/E4+上检查SBE事件:
LC-Slot4#show controllers mcc192 ecc MCC192 SDRAM ECC Counters SBE = 0x0, MBE = 0x0 TX192 SDRAM ECC Counters SBE = 0x0, MBE = 0x0
此报告RX和TX内存。
在控制台输出中可以看到以下错误:
SLOT 1:Jun 26 20:45:53 KST: %EE192-6-WAHOOECC: RX WAHOO: An PLU SDRAM ECC correctable error occured address 20000254 SLOT 9:Sep 2 21:27:49.680 GMT+8: %MCC192-3-PKTMEM_SBE: Single bit error detected and corrected SLOT 14:Jul 18 07:19:24.637: RX_XBMA: 1-bit CPUIM_ECCERR1 error 0x2 SLOT 15:Jan 4 16:53:16.591: TX_XBMA: (1) QSRAM qinfo SBE detected. info: 0x82605455 SLOT 12:Dec 12 22:34:15: %EE192-4-BM_ERRSSS: FrFab BM BADDR ECC ERR info single bit error(s) corrected, error 8250F63E count: 2 SLOT 1:Nov 22 13:40:02 JST: %EE192-3-QM_ERROR: RX_XBMA OQLLM error error register 0x1 -Traceback= 40AE71AC 406078C4 405F5EC0 SLOT 7:001113: Oct 24 10:50:28.520 BST: %EE192-3-WAHOOERRS: RX WAHOO: WAHOO_CSRAM_CNTRL_INT PIPE0 error 8 SLOT 6:Oct 4 16:48:00.487: %EE192-3-WAHOOERRSSS: RX WAHOO: WAHOO_FFCRAM_CNTRL_INT PIPE0 error 4 addr 3FBFAB8 agent 94 SLOT 7:001114: Oct 24 10:50:28.520 BST: %EE192-3-WAHOOERRSSSS: RX WAHOO: WAHOO_PPC_INT PIPE1 error pl_ctl 4000226 pl_aa_avl F9F7B pl_aa_end 7FF9 pl_aa_fatal 4800000 SLOT 6:Oct 4 16:48:00.487: %EE192-3-WAHOOERRS: RX WAHOO WAHOO_NFC_SRAM_MULTI_ECC_ERR multi-bit CSSRAM error SLOT 6:Oct 4 16:48:00.487: %EE192-3-WAHOOERRS: WAHOO_CTCAM_CNTRL_INT multi-bit CSRAM error SLOT 6:Oct 4 16:48:00.487: %EE192-3-WAHOOERRS: WAHOO_FFCRAM_CNTRL_INT MBE SLOT 6:Oct 4 16:48:00.487: %EE192-3-WAHOOERRS: FSRAM not OK WAHOO_FSRAM_CNTRL_INT ECC_1_BIT_EE | ECC_UNCORR_EE SLOT 6:Oct 4 16:48:00.487: %EE192-3-WAHOOERRS: WAHOO_CTCAM_CNTRL_INT multi-bit CSRAM error SLOT 1:00:01:14: WEEKLY_THROTTLE_SOCKEYE_SBE: SOCKEYE SBE: addr: 0xC2A007C0, synd: 0xC4 SLOT 1:00:01:14: WEEKLY_THROTTLE_CBSRAM_SBE_TX+i: CBSRAM SBE TX: 1-bit CBSRAM error. SLOT 1:00:01:14: WEEKLY_THROTTLE_CBSRAM_SBE_RX+i: CBSRAM SBE RX: 1-bit CBSRAM error. SLOT 1:00:01:14: WEEKLY_THROTTLE_CSSRAM_SBE_TX+i: CSSRAM SBE TX: 1-bit CSSRAM error. SLOT 1:00:01:14: WEEKLY_THROTTLE_CSSRAM_SBE_RX+i: CSSRAM SBE RX: 1-bit CSSRAM error. SLOT 1:00:01:14: WEEKLY_THROTTLE_CSRAM_SBE_TX+i: CSRAM SBE TX: 1-bit CSRAM error. SLOT 1:00:01:14: WEEKLY_THROTTLE_CSRAM_SBE_RX+i: CSRAM SBE RX: 1-bit CSRAM error. SLOT 1:00:01:14: WEEKLY_THROTTLE_W_FW_TCAM_PRTY_TX+throttle_i: TX FTCAM PRTY error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_FW_TCAM_PRTY_RX+throttle_i: RX FTCAM PRTY error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_CL_TCAM_PRTY_TX+throttle_i: TX CLTCAM PRTY error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_CL_TCAM_PRTY_RX+throttle_i: RX CLTCAM PRTY error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_NF_TCAM_PRTY_TX+throttle_i: TX NFTCAM PRTY error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_NF_TCAM_PRTY_RX+throttle_i: RX NFTCAM PRTY error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_TCAM_PRTY_VMR: TCAM PRTY VMR error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_TCAM_PRTY_NO-VMR: TCAM PRTY NO-VMR error, status = 0x3 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_FCRAM_SBE_TX: FCRAM SBE TX error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_FCRAM_SBE_RX: FCRAM SBE TX error, status = 0x3 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_FCRAM_PER_CHIP_SBE_TX: FCRAM CHIP SBE error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_ FCRAM_PER_CHIP_SBE_RX: FCRAM CHIP SBE error, status = 0x3 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_FSRAM_SBE_TX: FSRAM SBE TX error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_FSRAM_SBE_RX: FSRAM SBE RX error, status = 0x3 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_ FSRAM_MBE_TX: FSRAM MBE RX error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_W_ FSRAM_MBE_RX: FSRAM MBE RX error, status = 0x3 SLOT 1:00:01:14: WEEKLY_THROTTLE_BM_ISERR_TX: ISERR TX error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_BM_ISERR_RX: ISERR RX error, status = 0x3 SLOT 1:00:01:14: WEEKLY_THROTTLE_BM_FCRAM_SBE_TX: FCRAM SBE TX error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_BM_FCRAM_SBE_RX: FCRAM SBE RX error, status = 0x3 SLOT 1:00:01:14: WEEKLY_THROTTLE_QM_QSRAM_LINK_SBE_TX: QSRAM LINK SBE TX error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_QM_QSRAM_LINK_SBE_RX: QSRAM LINK SBE RX error, status = 0x3 SLOT 1:00:01:14: WEEKLY_THROTTLE_QM_QSRAM_QEINFO_SBE_TX: QSRAM queue info sbe tx error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_QM_QSRAM_QEINFO_SBE_TX: QSRAM queue info sbe rx error, status = 0x3 SLOT 1:00:01:14: WEEKLY_THROTTLE_QM_QSRAM_BADDR_SBE_TX: qsram bad addr sbe tx error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_ QM_QSRAM_BADDR_SBE_RX: qsram bad addr sbe rx error, status = 0x3 SLOT 1:00:01:14: WEEKLY_THROTTLE_QM_OQLLM_SBE_TX: oqllm sbe tx error, status = 0x2 SLOT 1:00:01:14: WEEKLY_THROTTLE_QM_OQLLM_SBE_RX: oqllm sbe rx error status = 0x3
在控制台输出中可以看到以下错误:
SLOT 0:Jan 14 08:53:44.581 GMT: %FIA-3-RAMECCERR: To Fabric ECC error was detected Single Bit Error RAM2 status = 0x8000 Syndrome = 0x0 addr = 0x0 SLOT 6:Apr 29 09:36:12: %E6LC-4-ECC_THRESHOLD: HERMES VID SBE exceeded threshold, possible memory failure SLOT 4:*Mar 13 23:38:19.295: %E6_RX192-3-MTRIE_SBE: Head1 Syndrome=0x94 Addr=0xFFF2B -Traceback= 40544830 40546A90 40688C94 400EDC18 SLOT 7:*Mar 4 1234:19.295: %E6_RX192-3-ADJ_SBE: Syndrome=0x59 Addr=0xFFF2B -Traceback= 40000830 40036A90 40555D44 400ddd23 SLOT 14:Dec 9 20:02:29: %E6_RX192-6-PBC_SBE: Single bit error detected and corrected RLDRAM Syndrome=0x61 Addr=0xF855 Dec 9 20:02:33: %GRP-4-RSTSLOT: Resetting the card in the slot: 14,Event: linecard error report SLOT 4:06:21:43: %E6_RX192-3-ACL_SBE: ACTION MEM Syndrome=0x7 Addr=0x0 -Traceback= 40549740 4054A7E0 4068D814 400EE018 SLOT 6:Mar 28 03:30:19: %RX192-3-HINTR: status = 0x1000000000000, mask = 0x7FFFFF0FA320F - L3X SBE error. -Traceback= 405816DC 406A1010 406A1650 400F70E8 SLOT 6:Mar 28 03:30:19: %E6_RX192-6-VID_SBE: Single bit error detected and corrected VID memory Syndrome=0x19 Addr=0xE51B SLOT 6:Nov 27 23:32:36: %HERA-3-PKTMEM_SBE: Single bit error detected and corrected Error=0x80 – Syndrome=0x5100000000000000 Addr=0x894620 Data bit116 SLOT 7:Oct 2 23:32:36: %HERA-6- MCD_SBE: Single bit error detected and corrected Error=0x50 – Syndrome=0x3100000000000000 Addr=0x331110 Data bit216 SLOT 1:Jun 22 03:32:36: %HERA-6- MRW_SBE: Single bit error detected and corrected Error=0x50 – Syndrome=0x3100000000000000 Addr=0x331110 Data bit216 SLOT 12:May 24 03:03:36: %HERA-6- UPF_SBE: Single bit error detected and corrected Error=0x60 – Syndrome=0x4100000000000000 Addr=0x451140 Data bit216 SLOT 13:Dec 5 07:30:15.272 cst: %HERA-6-PAM_ACL_SBE: PKT CNT MEM Syndrome=0x8 Addr=0x523C SLOT 9:May 5 18:52:14: %HERA-6-QM_FBF_SBE: Free Block FIFO - Single Bit Error detected and corrected Syndrom = 0x10, Addr = 0x778, samebit Yes, diffbit No SLOT 9:May 5 18:52:14: %HERA-3-QM: Error=0x40 - FBF RAM ECC SBE. -Traceback= 405AD4CC 405AF5D0 405F2E80 406DCDB8 406DD434 400FC500 SLOT 3:Aug 16 00:45:14: %MCC192-6-RED_AQD_SBE: Average Queue Depth - Single Bit Error detected and corrected Syndrome = 0x7, Address = 0x89, samebit No, diffbit No SLOT 2:Jan 23 06:29:56 KST: %MCC192-6-RED_STAT_SBE: Statistics - Single Bit Error detected and corrected Syndrome = 0x38, Address = 0xFF, samebit No, diffbit No SLOT 4:*Mar 13 23:38:19.295: %E6_RX192-3-MTRIE_MBE: Single bit error detected and corrected Head1 Syndrome=0x94 Addr=0xFFF2B SLOT 7:*Mar 4 1234:19.295: %E6_RX192-3-ADJ_MBE: Syndrome=0x59 Addr=0xFFF2B -Traceback= 40000830 40036A90 40555D44 400ddd23 00:00:18: %E6_RX192-3-PBC_MBE: ADJ OBANK LO Syndrome=0xE5 Addr=0x142 -Traceback= 405BF8B0 405C0F08 406E8D78 406E93B8 400FCCE0 SLOT 6:Mar 28 03:30:19: %E6_RX192-6-VID_MBE: Single bit error detected and corrected VID memory Syndrome=0x19 Addr=0xE51B SLOT 0:Apr 18 06:44:53.751 GMT: %HERA-3-PKTMEM_MBE: Error=0x1010 - Syndrome=0x9900000000 SLOT 7:Oct 2 23:32:36: %HERA-6- MCD_MBE: Single bit error detected and corrected Error=0x50 – Syndrome=0x3100000000000000 Addr=0x331110 Data bit216 SLOT 1:Jun 22 03:32:36: %HERA-6- MRW_MBE: Single bit error detected and corrected Error=0x50 - Syndrome=0x3100000000000000 Addr=0x331110 Data bit216 SLOT 13:Dec 5 07:30:15.272 cst: %HERA-6-PAM_ACL_MBE: PKT CNT MEM Syndrome=0x8 Addr=0x523C SLOT 9:May 5 18:52:14: %HERA-6-QM_FBF_MBE: Free Block FIFO - Single Bit Error detected and corrected Syndrome = 0x10, Addr = 0x778, samebit Yes, diffbit No SLOT 3:Aug 16 00:45:14: %MCC192-6-RED_AQD_MBE: Average Queue Depth - Single Bit Error detected and corrected Syndrome = 0x7, Address = 0x89, samebit No, diffbit No SLOT 2:Jan 23 06:29:56 KST: %MCC192-6-RED_STAT_MBE: Statistics - Single Bit Error detected and corrected Syndrome = 0x38, Address = 0xFF, samebit No, diffbit No
在控制台输出中可以看到以下错误:
SLOT 7:Jan 4 02:04:00.487: %SPA_CHOC_DSX-3-UNCOR_PARITY_ERR: SPA4/0: CHOC SPA parity error(s) encountered SLOT 7:Jan 4 02:04:00.487: %MCT1E1-3-UNCOR_PARITY_ERR: SPA5/0: T1E1 SPA parity error(s) encountered SLOT 3: 00:33:48: %MCT1E1-3-UNCOR_MEM_ERR: SPA3/0: 1 uncorrectable HDLC SRAM memory error(s) encountered. SLOT 1:Oct 3 14:42:45.727: %SPA_PLIM-4-SBE_ECC: SPA-4XT3/E3[1/2] reports 2 SBE occurrence at 1 addresses SLOT 1: Jul 22 05:26:29.613 UTC: %SPA_DATABUS-3-SPI4_SINGLE_DIP4_PARITY: SIP Sbslt 0 Ingress Sink - A single DIP4 parity error has occurred on the data bus. SLOT 4: Dec 2 22:44:05: %SPA_DATABUS-3-SPI4_SINGLE_DIP2_PARITY: SIP Sbslt 0 Egress Source - A single DIP 2 parity error on the FIFO status bus has occurred. SLOT 1:Oct 3 14:42:45.727: %SPA_PLIM-4-SBE_OVERFLOW: SPA-4XT3/E3[1/2] reports SBE table (2 elements) overflows SLOT 1:Oct 3 14:42:45.727: % SPA_PLUGIN-3-SPI4_SETCB: SPA-4XT3/E3[1/2] : IPC SPI4 set callback failed(status 2).
有关交换矩阵卡的所有奇偶校验错误消息在Cisco 12000系列互联网路由器的硬件故障排除中有详细介绍。这些消息包括(非详尽列表):
%FABRIC-3-PARITYERR: To Fabric parity error was detected. Grant parity error Data = 0x2. SLOT 1:%FABRIC-3-PARITYERR: To Fabric parity error was detected. Grant parity error Data = 0x1