Introduction
This document describes the steps to troubleshoot cases when a card of Virtualized Packet Core (VPC) in Cisco Ultra Services Platform (UltraM) is stuck in booting with "Failed to find VDU" error as seen in show logs.
Background Information
Sample:
2017-Sep-26+08:05:05.839 [emctrl 218804 error] [2/0/16829 <emctrl:0> emctrl_vnf.c:828] [software internal system syslog] Failed to find VDU, of card number <1>
If you further check the logs, you see the very specific error that points out that the card type does not match Extension Mobility (EM) information:
2017-Sep-26+08:03:32.126 [emctrl 218802 info] [2/0/16829 <emctrl:0> emctrl_util.c:381] [software internal system critical-info syslog] siti msg for standby CF, card type doesn't match EM, reboot it
2017-Sep-26+08:03:32.126 [emctrl 218802 info] [2/0/16829 <emctrl:0> emctrl_util.c:376] [software internal system critical-info syslog] siti card 1 card type drvctrl 40010100, siti 0
2017-Sep-26+08:03:32.126 [emctrl 218802 info] [2/0/16829 <emctrl:0> emctrl_util.c:329] [software internal system critical-info syslog] siti sync msg received for card 1 with cardtype 40010100, uuid 9F1F2B1E-35FC-4AF9-807A-E856336702D6
2017-Sep-26+08:03:32.105 [system 1004 info] [2/0/9741 <evlogd:0> evlgd_syslogd.c:279] [software internal system syslog] CPU[2/0]: sitiserv[9533]: SITI_PRESENT: invoking notify card present cmd notify_card_present 1 0 0x40010100 9F1F2B1E-35FC-4AF9-807A-E856336702D6
Commands to Check
As seen from the error, there is Universally Unique Identifier (UUID) of the affected card - in this sample UUID is 9F1F2B1E-35FC-4AF9-807A-E856336702D6.
Ideally, this UUID should match with the output of the show emctrl vdu detail output command.
show emctrl vdu detail is the hidden command.
[local]UltraM-QVPC-DI# show emctrl vdu detail
Showing emctrl vdu
card[01]: name[CFC_01 ] uuid[1FE70E43-0F33-4E17-8BFA-439169CD52BA]
card[02]: name[CFC_02 ] uuid[3AFC540B-546E-4F35-A645-A23E62C32C59]
card[03]: name[SFC_03 ] uuid[93359FA0-09C2-4F7C-93F6-17BE0A2AF49F]
card[04]: name[SFC_04 ] uuid[E02C8AAA-7E8A-4881-8018-6EC59963C8F6]
card[05]: name[SFC_05 ] uuid[6F297BF6-4AFC-43AB-A36D-FCD0FAE39DA3]
If this output is empty, it might be possible that the EMCtrl process might be corrupted.
This ID should be the same as seen on the EM as highlighted:
admin@scm# show vdus vdu card-type session-function
vdus vdu session-function
card-type session-function
vnfci BOOT_generic_di-chassis_SF1_1
constituent-element-group di-chassis
is-infra true
initialized false
vim-id 93359fa0-09c2-4f7c-93f6-17be0a2af49f
vnfci BOOT_generic_di-chassis_SF2_1
constituent-element-group di-chassis
is-infra true
initialized false
vim-id e02c8aaa-7e8a-4881-8018-6ec59963c8f6
vnfci BOOT_generic_di-chassis_SF3_1
constituent-element-group di-chassis
is-infra true
initialized false
vim-id 54e9a5d6-f4dd-4636-95d3-b29443ebfa14
More information about this instance on the StarOS side can be found with this command:
[local]UltraM-QVPC-DI# show vdu detail type session-function instance BOOT_generic_di-chassis_SF1_1
vdu-id: session-function, vdu-instance: BOOT_generic_di-chassis_SF1_1, state: from:Invalid to:Alive
card_number: 3, card_type: 0x42030100, uuid:93359fa0-09c2-4f7c-93f6-17be0a2af49f
networks:
cp-id: di_intf1, state: Alive, type: unknown
vl: vl-di-internal1 vnfc: sf-vnfc-di-chassis
mac: fa:16:3e:87:ac:e4, ip: 192.168.1.12
cp-id: di_intf2, state: Alive, type: unknown
vl: vl-di-internal2 vnfc: sf-vnfc-di-chassis
mac: fa:16:3e:92:ea:26, ip: 192.168.2.11
cp-id: orch, state: Alive, type: unknown
vl: vl-orchestration vnfc: sf-vnfc-di-chassis
mac: fa:16:3e:1e:f5:b5, ip: 172.16.180.21
cp-id: svc_intf1, state: Alive, type: unknown
vl: vl-service-network1 vnfc: sf-vnfc-di-chassis
mac: fa:16:3e:bf:c8:6f, ip: 10.10.10.2
cp-id: svc_intf2, state: Alive, type: unknown
vl: vl-service-network2 vnfc: sf-vnfc-di-chassis
mac: fa:16:3e:15:a9:22, ip: 20.20.20.7
cp-id: svc_intf3, state: Alive, type: unknown
vl: vl-service-network1 vnfc: sf-vnfc-di-chassis
mac: fa:16:3e:1f:fa:0c, ip: 10.10.10.6
cp-id: svc_intf4, state: Alive, type: unknown
vl: vl-service-network2 vnfc: sf-vnfc-di-chassis
mac: fa:16:3e:2f:6b:00, ip: 20.20.20.10
Inconsistency Scenario 1: Different ID as Seen on EMCtrl vs EM VDU Instance
If you pay attention to the ID of Card 5, you see that it is 6F297BF6-4AFC-43AB-A36D-FCD0FAE39DA3.
[local]UltraM-QVPC-DI# show emctrl vdu detail
Showing emctrl vdu
card[01]: name[CFC_01 ] uuid[1FE70E43-0F33-4E17-8BFA-439169CD52BA]
card[02]: name[CFC_02 ] uuid[3AFC540B-546E-4F35-A645-A23E62C32C59]
card[03]: name[SFC_03 ] uuid[93359FA0-09C2-4F7C-93F6-17BE0A2AF49F]
card[04]: name[SFC_04 ] uuid[E02C8AAA-7E8A-4881-8018-6EC59963C8F6]
card[05]: name[SFC_05 ] uuid[6F297BF6-4AFC-43AB-A36D-FCD0FAE39DA3]
Yet if you check for the same ID on the EM, you do not find it:
admin@scm# show vdus | include vim
vim-id 1fe70e43-0f33-4e17-8bfa-439169cd52ba ---> CF 1
vim-id 3afc540b-546e-4f35-a645-a23e62c32c59 ---> CF 2
vim-id 93359fa0-09c2-4f7c-93f6-17be0a2af49f ---> SF 3
vim-id e02c8aaa-7e8a-4881-8018-6ec59963c8f6 ---> SF 4
vim-id 54e9a5d6-f4dd-4636-95d3-b29443ebfa14 ---> ?
So you can see that for the card in slot 5, there seems to be inconsistency.
When you check in more details for the specific ID on the StarOS, you now see that with the show vdu detail command the ID is actually the same as seen on the EM side:
[local]UltraM-QVPC-DI# show vdu detail type session-function instance BOOT_generic_di-chassis_SF3_1
vdu-id: session-function, vdu-instance: BOOT_generic_di-chassis_SF3_1, state: from:Invalid to:Alive
card_number: 5, card_type: 0x42030100, uuid:54e9a5d6-f4dd-4636-95d3-b29443ebfa14
With this, you can confirm that the EMCtrl process does not have right information.
If you check the log, you see this warning:
2017-Sep-26+08:36:31.317 UltraM-QVPC-DI [emctrl 218802 info] [2/0/20871 <emctrl:0> emctrl_util.c:579] [software internal system critical-info syslog] drvctrl uuid mismatch /6F297BF6-4AFC-43AB-A36D-FCD0FAE39DA3 with em uuid 54e9a5d6-f4dd-4636-95d3-b29443ebfa14, use drvctrl uuid
1. If you kill the EMCtrl task, it does not help.
2. Also, if you restart the card, it does not help.
Inconsistency Scenario 2: Show EMCtrl VDU Detail Empty
This is likely as due to corrupted EMCtrl table and it is the consequence of the bug as per knowledge that you have so far.
The output of the show emctrl vdu list would be fully empty:
Showing emctrl vdu
card[01]: name[ ] uuid[ ]
card[02]: name[ ] uuid[ ]
In order to check the actual state of the card from VNFM Proxy side:
#show vdu detail type control-function instance BOOT_generic_di-chasis_CF1_1
vdu-id: control-function, vdu-instance: BOOT_generic_di-chasis_CF1_1, state: from:Invalid to:Alive
Known bug: CSCvf32599
Workaround: Restart the EMCtrl task:
task kill facility emctrl all
Inconsistency Scenario 3: CF Missing from Card Table, does not Exist in EM
Sometimes, you see that SF or CF is missing from the card table.
As you see from the output, the StarOS sees just one CF card:
[local]AUPGW101# show card tabl
Wednesday September 27 09:26:46 UTC 2017
Slot Card Type Oper State SPOF Attach
----------- -------------------------------------- ------------- ---- ------
1: CFC Control Function Virtual Card Active Yes
3: FC 4-Port Service Function Virtual Card Active No
4: FC 4-Port Service Function Virtual Card Active No
5: FC 4-Port Service Function Virtual Card Active No
6: FC 4-Port Service Function Virtual Card Active No
7: FC 4-Port Service Function Virtual Card Active No
8: FC 4-Port Service Function Virtual Card Active No
9: FC 4-Port Service Function Virtual Card Active No
10: FC 4-Port Service Function Virtual Card Standby -
Yet, if you check the debug console for the card 2, you see that it attempts to come online:
[local]AUPGW101# debug consol card 1 cpu 0
Wednesday September 27 09:26:58 UTC 2017
[local]AUPGW101# 2017-Sep-27+09:23:18.370 card 1-cpu0: collect persistdump for card <2> success
2017-Sep-27+09:24:22.112 card 1-cpu0: Hatsystem rcvd card 2/0 fail req from card (1) emctrl/0 - 32:150:3
2017-Sep-27+09:24:22.115 card 1-cpu0: The Control Function Virtual Card with serial number in slot 2 has failed and will be brought down and brought back online. (Device=CARD, Reason=EMCTRL_CARDTYPE_MISMATCH, Status=0)
This is, as you can see from show log as EMCtrl thinks that the CF does not exist in EM:
2017-Sep-27+09:27:13.964 [emctrl 218802 info] [1/0/7805 <emctrl:0> emctrl_util.c:357] [software internal system critical-info syslog] siti msg for standby CF, but doesn't exist in EM, reboot it
2017-Sep-27+09:27:13.964 [emctrl 218802 info] [1/0/7805 <emctrl:0> emctrl_util.c:329] [software internal system critical-info syslog] siti sync msg received for card 2 with cardtype 40010100, uuid C6217904-8F65-4C48-B607-4F13EAE6745D
2017-Sep-27+09:27:13.939 [system 1004 info] [1/0/7684 <evlogd:0> evlgd_syslogd.c:279] [software internal system syslog] CPU[1/0]: sitiserv[3063]: SITI_PRESENT: invoking notify card present cmd notify_card_present 2 0 0x40010100 C6217904-8F65-4C48-B607-4F13EAE6745D
You can indeed confirm that:
[local]AUPGW101# show emctrl vdu list
Wednesday September 27 09:30:21 UTC 2017
Showing emctrl vdu
card[01]: name[CFC_01 ] uuid[42913D9A-91A9-4E5E-8473-AEADD73BEC08]
card[03]: name[SFC_03 ] uuid[CB2C4429-0965-4394-8200-ABB4071BB067]
card[04]: name[SFC_04 ] uuid[17997C02-DF9F-40BC-8A41-D2B9D448D47C]
card[05]: name[SFC_05 ] uuid[159F91EE-B6A4-4DE6-A8C9-F900CD087093]
card[06]: name[SFC_06 ] uuid[7EE371A9-4E64-477F-AA09-42B6ED70B92B]
card[07]: name[SFC_07 ] uuid[DF2D38F2-01FD-4E95-97EC-4B1EB75683FD]
card[08]: name[SFC_08 ] uuid[E7D7F817-09C6-4EBA-9537-A66A686713A1]
card[09]: name[SFC_09 ] uuid[B24BE6CC-EB7B-483D-A859-284EF638647C]
card[10]: name[SFC_10 ] uuid[2AAD074F-C65C-4708-AAA9-A76588BD434D]
Workaround: Restart the EMCtrl task.