简介
本文档介绍使用ACI PBR Multipod环境在远程POD上识别和排除IP SLA跟踪设备故障的步骤。
先决条件
要求
Cisco 建议您了解以下主题:
使用的组件
本文档中的信息基于以下软件和硬件版本:
- 思科ACI版本4.2(7l)
- 思科枝叶交换机N9K-C93180YC-EX
- 思科主干交换机N9K-C9336PQ
- Nexus 7000版本8.2(2)
本文档中的信息都是基于特定实验室环境中的设备编写的。本文档中使用的所有设备最初均采用原始(默认)配置。如果您的网络处于活动状态,请确保您了解所有命令的潜在影响。
网络拓扑
拓扑
背景信息
使用服务图,思科ACI可以将安全区域之间的流量重定向到防火墙或负载均衡器,而无需将防火墙或负载均衡器作为服务器的默认网关。
PBR设置中的IP SLA功能允许ACI交换矩阵监控环境中的该服务节点(L4-L7设备),并使交换矩阵不会将源和目标之间的流量重定向至不可达的服务节点。
注意:ACI IPSLA依靠交换矩阵系统GIPO(组播地址239.255.255.240/28)发送探测和分发跟踪状态。
场景
在本示例中,在POD-1上的源终端192.168.150.1与POD-2上的目标服务器192.168.151.1之间无法完成东-西连接。流量正从POD-1上的服务枝叶103重定向到PBR节点172.16.1.1。PBR正在使用IP SLA监控和重定向运行状况组策略。
故障排除步骤
步骤1:确定IP SLA状态
- 在APIC UI上,导航到租户> Your_Tenant >故障。
- 查找故障F2911、F2833、F2992。
IP SLA故障
第二步:确定运行状况组处于关闭状态的节点ID
- 在APIC CLI上,使用故障F2911、F2833和F2992运行moquery命令。
- 可以看到POD-2中枝叶202的运行状况组lb1::lb-healthGrp已关闭。
MXS2-AP002# moquery -c faultInst -f 'fault.Inst.code == "F2911"'
# fault.Inst
code : F2911
ack : no
alert : no
annotation :
cause : svcredir-healthgrp-down
changeSet : operSt (New: disabled), operStQual (New: healthgrp-service-down)
childAction :
created : 2024-01-31T19:07:31.505-06:00
delegated : yes
descr : PBR service health grp lb1::lb-healthGrp on nodeid 202 fabric hostname MXS2-LF202 is in failed state, reason Health grp service is down.
dn : topology/pod-2/node-202/sys/svcredir/inst/healthgrp-lb1::lb-healthGrp/fault-F2911 <<<
domain : infra
extMngdBy : undefined
highestSeverity : major
第三步:验证PBR设备是否作为终端获知且可从服务枝叶访问
MXS2-LF103# show system internal epm endpoint ip 172.16.1.1
MAC : 40ce.2490.5743 ::: Num IPs : 1
IP# 0 : 172.16.1.1 ::: IP# 0 flags : ::: l3-sw-hit: No
Vlan id : 22 ::: Vlan vnid : 13192 ::: VRF name : lb1:vrf1
BD vnid : 15958043 ::: VRF vnid : 2162693
Phy If : 0x1a00b000 ::: Tunnel If : 0
Interface : Ethernet1/12
Flags : 0x80004c04 ::: sclass : 16391 ::: Ref count : 5
EP Create Timestamp : 02/01/2024 00:36:23.229262
EP Update Timestamp : 02/02/2024 01:43:38.767306
EP Flags : local|IP|MAC|sclass|timer|
MXS2-LF103# iping 172.16.1.1 -V lb1:vrf1
PING 172.16.1.1 (172.16.1.1) from 172.16.1.254: 56 data bytes
64 bytes from 172.16.1.1: icmp_seq=0 ttl=255 time=1.046 ms
64 bytes from 172.16.1.1: icmp_seq=1 ttl=255 time=1.074 ms
64 bytes from 172.16.1.1: icmp_seq=2 ttl=255 time=1.024 ms
64 bytes from 172.16.1.1: icmp_seq=3 ttl=255 time=0.842 ms
64 bytes from 172.16.1.1: icmp_seq=4 ttl=255 time=1.189 ms
--- 172.16.1.1 ping statistics ---
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min/avg/max = 0.842/1.034/1.189 ms
第四步:检查本地POD和远程POD中的PBR运行状况组
枝叶103是POD-1上的服务枝叶。因此,我们将POD-1视为本地POD,将POD-2视为远程POD。
运行状况组仅在源和目标EPG合同要求其部署的枝叶交换机上进行编程。
1. 源EPG位于枝叶节点102 POD-1上。您可以看到PBR设备在服务枝叶103 POD-1中被跟踪为UP。
MXS2-LF102# show service redir info health-group lb1::lb-healthGrp
=======================================================================================================================================
LEGEND
TL: Threshold(Low) | TH: Threshold(High) | HP: HashProfile | HG: HealthGrp | BAC: Backup-Dest | TRA: Tracking | RES: Resiliency
=======================================================================================================================================
HG-Name HG-OperSt HG-Dest HG-Dest-OperSt
======= ========= ======= ==============
lb1::lb-healthGrp enabled dest-[172.16.1.1]-[vxlan-2162693]] up
2. 目标EPG位于枝叶节点202 POD-2上。您可以看到PBR设备被跟踪为从服务枝叶103 POD-1关闭。
MXS2-LF202# show service redir info health-group lb1::lb-healthGrp
=======================================================================================================================================
LEGEND
TL: Threshold(Low) | TH: Threshold(High) | HP: HashProfile | HG: HealthGrp | BAC: Backup-Dest | TRA: Tracking | RES: Resiliency
=======================================================================================================================================
HG-Name HG-OperSt HG-Dest HG-Dest-OperSt
======= ========= ======= ==============
lb1::lb-healthGrp disabled dest-[172.16.1.1]-[vxlan-2162693]] down <<<<< Health Group is down.
第五步:使用ELAM工具捕获IP SLA探测
注意:您可以使用内置捕获工具Embedded Logic Analyzer Module (ELAM)来捕获传入数据包。ELAM语法取决于硬件类型。另一种方法是使用ELAM Assistant应用。
要捕获IP SLA探测,必须在ELAM语法上使用这些值来了解数据包到达或丢弃的位置。
ELAM内部L2报头
源MAC = 00-00-00-00-00-01
目的MAC = 01-00-00-00-00-00
注意:源MAC和目标MAC(之前显示)是IP SLA数据包的内部报头上的固定值。
ELAM外部L3报头
源IP =来自服务枝叶的TEP(枝叶103实验室中的TEP = 172.30.200.64)
目标IP = 239.255.255.240(交换矩阵系统GPO必须始终相同)
trigger reset
trigger init in-select 14 out-select 0
set inner l2 dst_mac 01-00-00-00-00-00 src_mac 00-00-00-00-00-01
set outer ipv4 src_ip 172.30.200.64 dst_ip 239.255.255.240
start
stat
ereport
...
------------------------------------------------------------------------------------------------------------------------------------------------------
Inner L2 Header
------------------------------------------------------------------------------------------------------------------------------------------------------
Inner Destination MAC : 0100.0000.0000
Source MAC : 0000.0000.0001
802.1Q tag is valid : no
CoS : 0
Access Encap VLAN : 0
------------------------------------------------------------------------------------------------------------------------------------------------------
Outer L3 Header
------------------------------------------------------------------------------------------------------------------------------------------------------
L3 Type : IPv4
DSCP : 0
Don't Fragment Bit : 0x0
TTL : 27
IP Protocol Number : UDP
Destination IP : 239.255.255.240
Source IP : 172.30.200.64
第六步:检查本地和远程主干上是否编程交换矩阵系统GIPO (239.255.255.240)
注意:对于每个GIPO,每个POD中仅有一个主干节点被选举为权威设备,以转发组播帧并将IGMP加入发送到IPN。
1. 主干1001 POD-1是转发组播帧和向IPN发送IGMP加入的授权交换机。
接口Eth1/3面向N7K IPN。
MXS2-SP1001# show isis internal mcast routes gipo | more
IS-IS process: isis_infra
VRF : default
GIPo Routes
====================================
System GIPo - Configured: 0.0.0.0
Operational: 239.255.255.240
====================================
<OUTPUT CUT> ...
GIPo: 239.255.255.240 [LOCAL]
OIF List:
Ethernet1/35.36
Ethernet1/3.3(External) <<< Interface must point out to IPN on elected Spine
Ethernet1/16.40
Ethernet1/17.45
Ethernet1/2.37
Ethernet1/36.42
Ethernet1/1.43
MXS2-SP1001# show ip igmp gipo joins | grep 239.255.255.240
239.255.255.240 0.0.0.0 Join Eth1/3.3 43 Enabled
2. 主干2001 POD-2是转发组播帧和向IPN发送IGMP加入的授权交换机。
接口Eth1/36面向N7K IPN。
MXS2-SP2001# show isis internal mcast routes gipo | more
IS-IS process: isis_infra
VRF : default
GIPo Routes
====================================
System GIPo - Configured: 0.0.0.0
Operational: 239.255.255.240
====================================
<OUTPUT CUT> ...
GIPo: 239.255.255.240 [LOCAL]
OIF List:
Ethernet1/2.40
Ethernet1/1.44
Ethernet1/36.36(External) <<< Interface must point out to IPN on elected Spine
MXS2-SP2001# show ip igmp gipo joins | grep 239.255.255.240
239.255.255.240 0.0.0.0 Join Eth1/36.36 76 Enabled
3. 确保两个主干的VSH中的outgoing-interface-list gipo不为空。
MXS2-SP1001# vsh
MXS2-SP1001# show forwarding distribution multicast outgoing-interface-list gipo | more
....
Outgoing Interface List Index: 1
Reference Count: 1
Number of Outgoing Interfaces: 5
Ethernet1/35.36
Ethernet1/3.3
Ethernet1/2.37
Ethernet1/36.42
Ethernet1/1.43
External GIPO OIFList
Ext OIFL: 8001
Ref Count: 393
No OIFs: 1
Ethernet1/3.3
步骤 7.验证IPN上是否配置了GIPO (239.255.255.240)
1. IPN配置中缺少GIPO 239.255.255.240。
N7K-ACI_ADMIN-VDC-ACI-IPN-MPOD# show run pim
...
ip pim rp-address 192.168.100.2 group-list 225.0.0.0/15 bidir
ip pim ssm range 232.0.0.0/8
N7K-ACI_ADMIN-VDC-ACI-IPN-MPOD# show ip mroute 239.255.255.240
IP Multicast Routing Table for VRF "default"
(*, 239.255.255.240/32), uptime: 1d01h, igmp ip pim
Incoming interface: Null, RPF nbr: 0.0.0.0 <<< Incoming interface and RPF are MISSING
Outgoing interface list: (count: 2)
Ethernet3/3.4, uptime: 1d01h, igmp
Ethernet3/1.4, uptime: 1d01h, igmp
2. IPN上现在配置了GIPO 239.255.255.240。
N7K-ACI_ADMIN-VDC-ACI-IPN-MPOD# show run pim
...
ip pim rp-address 192.168.100.2 group-list 225.0.0.0/15 bidir
ip pim rp-address 192.168.100.2 group-list 239.255.255.240/28 bidir <<< GIPO is configured
ip pim ssm range 232.0.0.0/8
N7K-ACI_ADMIN-VDC-ACI-IPN-MPOD# show ip mroute 225.0.42.16
IP Multicast Routing Table for VRF "default"
(*, 225.0.42.16/32), bidir, uptime: 1w6d, ip pim igmp
Incoming interface: loopback1, RPF nbr: 192.168.100.2
Outgoing interface list: (count: 2)
Ethernet3/1.4, uptime: 1d02h, igmp
loopback1, uptime: 1d03h, pim, (RPF)
步骤 8确认远程POD上的IP SLA跟踪已启动
MXS2-LF202# show service redir info health-group lb1::lb-healthGrp
=======================================================================================================================================
LEGEND
TL: Threshold(Low) | TH: Threshold(High) | HP: HashProfile | HG: HealthGrp | BAC: Backup-Dest | TRA: Tracking | RES: Resiliency
=======================================================================================================================================
HG-Name HG-OperSt HG-Dest HG-Dest-OperSt
======= ========= ======= ==============
lb1::lb-healthGrp enabled dest-[172.16.1.1]-[vxlan-2162693]] up
相关信息
Cisco Bug ID |
Bug标题 |
修复版本 |
思科漏洞ID CSCwi75331 |
|
无固定版本。使用解决方法。 |