Information about Implementing BGP
To implement BGP, you need to understand the following concepts:
BGP Router Identifier
For BGP sessions between neighbors to be established, BGP must be assigned a router ID. The router ID is sent to BGP peers in the OPEN message when a BGP session is established.
BGP attempts to obtain a router ID in the following ways (in order of preference):
-
By means of the address configured using the bgp router-id command in router configuration mode.
-
By using the highest IPv4 address on a loopback interface in the system if the router is booted with saved loopback address configuration.
-
By using the primary IPv4 address of the first loopback address that gets configured if there are not any in the saved configuration.
If none of these methods for obtaining a router ID succeeds, BGP does not have a router ID and cannot establish any peering sessions with BGP neighbors. In such an instance, an error message is entered in the system log, and the show bgp summary command displays a router ID of 0.0.0.0. After BGP has obtained a router ID, it continues to use it even if a better router ID becomes available. This usage avoids unnecessary flapping for all BGP sessions. However, if the router ID currently in use becomes invalid (because the interface goes down or its configuration is changed), BGP selects a new router ID (using the rules described) and all established peering sessions are reset.
Note |
We strongly recommend that the bgp router-id command is configured to prevent unnecessary changes to the router ID (and consequent flapping of BGP sessions). |
BGP Default Limits
BGP imposes maximum limits on the number of neighbors that can be configured on the router and on the maximum number of prefixes that are accepted from a peer for a given address family. This limitation safeguards the router from resource depletion caused by misconfiguration, either locally or on the remote neighbor. The following limits apply to BGP configurations:
-
The default maximum number of peers that can be configured is 4000. The default can be changed using the bgp maximum neighbor command. The limit range is 1 to 15000. Any attempt to configure additional peers beyond the maximum limit or set the maximum limit to a number that is less than the number of peers currently configured will fail.
- To prevent a peer from
flooding BGP with advertisements, a limit is placed on the number of prefixes
that are accepted from a peer for each supported address family. The default
limits can be overridden through configuration of the maximum-prefix
limit command
for the peer for the appropriate address family. The following default limits
are used if the user does not configure the maximum number of prefixes for the
address family:
-
512K (524,288) prefixes for IPv4 unicast
-
128K (131,072) prefixes for IPv6 unicast
-
512K (524,288) prefixes for VPNv4 unicast
A cease notification message is sent to the neighbor and the peering with the neighbor is terminated when the number of prefixes received from the peer for a given address family exceeds the maximum limit (either set by default or configured by the user) for that address family.
It is possible that the maximum number of prefixes for a neighbor for a given address family has been configured after the peering with the neighbor has been established and a certain number of prefixes have already been received from the neighbor for that address family. A cease notification message is sent to the neighbor and peering with the neighbor is terminated immediately after the configuration if the configured maximum number of prefixes is fewer than the number of prefixes that have already been received from the neighbor for the address family.
-
BGP Attributes and Operators
This table summarizes the BGP attributes and operators per attach points.
Attach Point |
Attribute |
Match |
Set |
---|---|---|---|
aggregation |
as-path |
in is-local length neighbor-is originates-from passes-through unique-length |
— |
as-path-length |
is, ge, le, eq |
— | |
as-path-unique-length |
is, ge, le, eq |
— | |
community |
is-empty matches-any matches-every |
set set additive delete in delete not in delete all |
|
destination |
in |
— | |
extcommunity cost |
— |
set set additive |
|
local-preference |
is, ge, le, eq |
set |
|
med |
is, eg, ge, le |
setset +set - |
|
next-hop |
in |
set |
|
origin |
is |
set |
|
source |
in |
— | |
suppress-route |
— |
suppress-route |
|
weight |
— |
set |
|
allocate-label |
as-path |
in is-local length neighbor-is originates-from passes-through unique-length |
— |
as-path-length |
is, ge, le, eq |
— | |
as-path-unique-length |
is, ge, le, eq |
— | |
community |
is-empty matches-any matches-every |
— | |
destination |
in |
— | |
label |
— |
set |
|
local-preference |
is, ge, le, eq |
— | |
med |
is, eg, ge, le |
— | |
next-hop |
in |
— | |
origin |
is |
— | |
source |
in |
— | |
clear-policy |
as-path |
in is-local length neighbor-is originates-from passes-through unique-length |
— |
as-path-length |
is, ge, le, eq |
— | |
as-path-unique-length |
is, ge, le, eq |
— | |
dampening |
as-path |
in is-local length neighbor-is originates-from passes-through unique-length |
— |
as-path-length |
is, ge, le, eq |
— | |
as-path-unique-length |
is, ge, le, eq |
— | |
community |
is-empty matches-any matches-every |
— | |
dampening |
—/ |
set dampening |
|
destination |
in |
— | |
local-preference |
is, ge, le, eq |
— | |
med |
is, eg, ge, le |
— | |
next-hop |
in |
— | |
origin |
is |
— | |
source |
in |
— | |
debug |
destination |
in |
— |
default originate |
med |
— |
set set + set - |
rib-has-route |
in |
— | |
neighbor-in |
as-path |
in is-local length NA neighbor-is originates-from passes-through unique-length |
prepend prepend most-recent remove as-path private-as replace |
as-path-length |
is, ge, le, eq |
— | |
as-path-unique-length |
is, ge, le, eq |
— | |
communitycommunity with ‘peeras’ |
is-empty matches-any matches-every |
set set additive delete-in delete-not-in delete-all |
|
destination |
in |
— | |
extcommunity cost |
— |
set set additive |
|
extcommunity rt |
is-empty matches-any matches-every matches-within |
set additive delete-in delete-not-in delete-all |
|
extcommunity soo |
is-empty matches-any matches-every matches-within |
— | |
local-preference |
is, ge, le, eq |
set |
|
med |
is, eg, ge, le |
set set + set - |
|
next-hop |
in |
set set peer address |
|
origin |
is |
set |
|
route-aggregated |
route-aggregated |
NA |
|
source |
in |
— | |
weight |
— |
set |
|
neighbor-out |
as-path |
in is-local length — neighbor-is originates-from passes-through unique-length |
prepend prepend most-recent remove as-path private-as replace |
as-path-length |
is, ge, le, eq |
— | |
as-path-unique-length |
is, ge, le, eq |
— | |
communitycommunity with ‘peeras’ |
is-empty matches-any matches-every |
set set additive delete-in delete-not-in delete-all |
|
destination |
in |
— | |
extcommunity cost |
— |
set set additive |
|
extcommunity rt |
is-empty matches-any matches-every matches-within |
set additive delete-in delete-not-in delete-all |
|
extcommunity soo |
is-empty matches-any matches-every matches-within |
— | |
local-preference |
is, ge, le, eq |
set |
|
med |
is, eg, ge, le |
set set + set - set max-unreachable set igp-cost |
|
next-hop |
in |
set set self |
|
origin |
is |
set |
|
path-type |
is |
— | |
rd |
in |
— | |
route-aggregated |
route-aggregated |
— |
|
source |
in |
— | |
unsuppress-route |
— |
unsuppress-route |
|
vpn-distinguisher |
— |
set |
|
neighbor-orf |
orf-prefix |
in |
n/a |
network |
as-path |
— |
prepend |
community |
— |
set set additive delete-in delete-not-in delete-all |
|
destination |
in |
— | |
extcommunity cost |
— |
set set additive |
|
mpls-label |
route-has-label |
— | |
local-preference |
— |
set |
|
med |
— |
set set+ set- |
|
next-hop |
in |
set |
|
origin |
— |
set |
|
route-type |
is |
— | |
tag |
is, ge, le, eq |
— | |
weight |
— |
set |
|
next-hop |
destination |
in |
— |
protocol |
is,in |
— | |
source |
in |
— | |
redistribute |
as-path |
— |
prepend |
community |
— |
set set additive delete in delete not in delete all |
|
destination |
in |
— | |
extcommunity cost |
— |
setset additive |
|
local-preference |
— |
set |
|
med |
— |
set set+ set- |
|
next-hop |
in |
set |
|
origin |
— |
set |
|
mpls-label |
route-has-label |
— | |
route-type |
is |
— | |
tag |
is, eq, ge, le |
— | |
weight |
— |
set |
|
retain-rt |
extcommunity rt |
is-empty matches-any matches-every matches-within |
— |
show |
as-path |
in is-local length neighbor-is originates-from passes-through unique-length |
— |
as-path-length |
is, ge, le, eq |
— | |
as-path-unique-length |
is, ge, le, eq |
— | |
community |
is-empty matches-any matches-every |
— | |
destination |
in |
— | |
extcommunity rt |
is-empty matches-any matches-every matches-within |
— | |
extcommunity soo |
is-empty matches-any matches-every matches-within |
— | |
med |
is, eg, ge, le |
— | |
next-hop |
in |
— | |
origin |
is |
— | |
source |
in |
— |
This table summarizes which operations are valid and where they are valid.
Command |
import |
export |
aggregation |
redistribution |
---|---|---|---|---|
prepend as-path most-recent |
eBGP only |
eBGP only |
n/a |
n/a |
replace as-path |
eBGP only |
eBGP only |
n/a |
n/a |
set med igp-cost |
forbidden |
eBGP only |
forbidden |
forbidden |
set weight |
n/a |
forbidden |
n/a |
n/a |
suppress |
forbidden |
forbidden |
n/a |
forbidden |
BGP Best Path Algorithm
BGP routers typically receive multiple paths to the same destination. The BGP best-path algorithm determines the best path to install in the IP routing table and to use for forwarding traffic. This section describes the Cisco IOS XR software implementation of BGP best-path algorithm, as specified in Section 9.1 of the Internet Engineering Task Force (IETF) Network Working Group draft-ietf-idr-bgp4-24.txt document.
The BGP best-path algorithm implementation is in three parts:
-
Part 1—Compares two paths to determine which is better.
-
Part 2—Iterates over all paths and determines which order to compare the paths to select the overall best path.
-
Part 3—Determines whether the old and new best paths differ enough so that the new best path should be used.
Note |
The order of comparison determined by Part 2 is important because the comparison operation is not transitive; that is, if three paths, A, B, and C exist, such that when A and B are compared, A is better, and when B and C are compared, B is better, it is not necessarily the case that when A and C are compared, A is better. This nontransitivity arises because the multi exit discriminator (MED) is compared only among paths from the same neighboring autonomous system (AS) and not among all paths. |
Comparing Pairs of Paths
Perform the following steps to compare two paths and determine the better path:
-
If either path is invalid (for example, a path has the maximum possible MED value or it has an unreachable next hop), then the other path is chosen (provided that the path is valid).
-
If the paths have unequal pre-bestpath cost communities, the path with the lower pre-bestpath cost community is selected as the best path.
-
If the paths have unequal weights, the path with the highest weight is chosen.
Note
The weight is entirely local to the router, and can be set with the weight command or using a routing policy.
-
If the paths have unequal local preferences, the path with the higher local preference is chosen.
Note
If a local preference attribute was received with the path or was set by a routing policy, then that value is used in this comparison. Otherwise, the default local preference value of 100 is used. The default value can be changed using the bgp default local-preference command.
-
If one of the paths is a redistributed path, which results from a redistribute or network command, then it is chosen. Otherwise, if one of the paths is a locally generated aggregate, which results from an aggregate-address command, it is chosen.
Note
Step 1 through Step 4 implement the “Path Selection with BGP”of RFC 1268.
-
If the paths have unequal AS path lengths, the path with the shorter AS path is chosen. This step is skipped if bgp bestpath as-path ignore command is configured.
Note
When calculating the length of the AS path, confederation segments are ignored, and AS sets count as 1.
Note
eiBGP specifies internal and external BGP multipath peers. eiBGP allows simultaneous use of internal and external paths.
-
If the paths have different origins, the path with the lower origin is selected. Interior Gateway Protocol (IGP) is considered lower than EGP, which is considered lower than INCOMPLETE.
-
If appropriate, the MED of the paths is compared. If they are unequal, the path with the lower MED is chosen.
A number of configuration options exist that affect whether or not this step is performed. In general, the MED is compared if both paths were received from neighbors in the same AS; otherwise the MED comparison is skipped. However, this behavior is modified by certain configuration options, and there are also some corner cases to consider.
If the bgp bestpath med always command is configured, then the MED comparison is always performed, regardless of neighbor AS in the paths. Otherwise, MED comparison depends on the AS paths of the two paths being compared, as follows:
-
If a path has no AS path or the AS path starts with an AS_SET, then the path is considered to be internal, and the MED is compared with other internal paths.
-
If the AS path starts with an AS_SEQUENCE, then the neighbor AS is the first AS number in the sequence, and the MED is compared with other paths that have the same neighbor AS.
-
If the AS path contains only confederation segments or starts with confederation segments followed by an AS_SET, then the MED is not compared with any other path unless the bgp bestpath med confed command is configured. In that case, the path is considered internal and the MED is compared with other internal paths.
-
If the AS path starts with confederation segments followed by an AS_SEQUENCE, then the neighbor AS is the first AS number in the AS_SEQUENCE, and the MED is compared with other paths that have the same neighbor AS.
Note
If no MED attribute was received with the path, then the MED is considered to be 0 unless the bgp bestpath med missing-as-worst command is configured. In that case, if no MED attribute was received, the MED is considered to be the highest possible value.
-
-
If one path is received from an external peer and the other is received from an internal (or confederation) peer, the path from the external peer is chosen.
-
If the paths have different IGP metrics to their next hops, the path with the lower IGP metric is chosen.
-
If the paths have unequal IP cost communities, the path with the lower IP cost community is selected as the best path.
-
If all path parameters in Step 1 through Step 10 are the same, then the router IDs are compared. If the path was received with an originator attribute, then that is used as the router ID to compare; otherwise, the router ID of the neighbor from which the path was received is used. If the paths have different router IDs, the path with the lower router ID is chosen.
Note
Where the originator is used as the router ID, it is possible to have two paths with the same router ID. It is also possible to have two BGP sessions with the same peer router, and therefore receive two paths with the same router ID.
-
If the paths have different cluster lengths, the path with the shorter cluster length is selected. If a path was not received with a cluster list attribute, it is considered to have a cluster length of 0.
-
Finally, the path received from the neighbor with the lower IP address is chosen. Locally generated paths (for example, redistributed paths) are considered to have a neighbor IP address of 0.
Order of Comparisons
The second part of the BGP best-path algorithm implementation determines the order in which the paths should be compared. The order of comparison is determined as follows:
-
The paths are partitioned into groups such that within each group the MED can be compared among all paths. The same rules as in are used to determine whether MED can be compared between any two paths. Normally, this comparison results in one group for each neighbor AS. If the bgp bestpath med always command is configured, then there is just one group containing all the paths.
-
The best path in each group is determined. Determining the best path is achieved by iterating through all paths in the group and keeping track of the best one seen so far. Each path is compared with the best-so-far, and if it is better, it becomes the new best-so-far and is compared with the next path in the group.
-
A set of paths is formed containing the best path selected from each group in Step 2. The overall best path is selected from this set of paths, by iterating through them as in Step 2.
Best Path Change Suppression
The third part of the implementation is to determine whether the best-path change can be suppressed or not—whether the new best path should be used, or continue using the existing best path. The existing best path can continue to be used if the new one is identical to the point at which the best-path selection algorithm becomes arbitrary (if the router-id is the same). Continuing to use the existing best path can avoid churn in the network.
Note |
This suppression behavior does not comply with the IETF Networking Working Group draft-ietf-idr-bgp4-24.txt document, but is specified in the IETF Networking Working Group draft-ietf-idr-avoid-transition-00.txt document. |
The suppression behavior can be turned off by configuring the bgp bestpath compare-routerid command. If this command is configured, the new best path is always preferred to the existing one.
Otherwise, the following steps are used to determine whether the best-path change can be suppressed:
-
If the existing best path is no longer valid, the change cannot be suppressed.
-
If either the existing or new best paths were received from internal (or confederation) peers or were locally generated (for example, by redistribution), then the change cannot be suppressed. That is, suppression is possible only if both paths were received from external peers.
-
If the paths were received from the same peer (the paths would have the same router-id), the change cannot be suppressed. The router ID is calculated using rules in .
-
If the paths have different weights, local preferences, origins, or IGP metrics to their next hops, then the change cannot be suppressed. Note that all these values are calculated using the rules in .
-
If the paths have different-length AS paths and the bgp bestpath as-path ignore command is not configured, then the change cannot be suppressed. Again, the AS path length is calculated using the rules in .
-
If the MED of the paths can be compared and the MEDs are different, then the change cannot be suppressed. The decision as to whether the MEDs can be compared is exactly the same as the rules in , as is the calculation of the MED value.
-
If all path parameters in Step 1 through Step 6 do not apply, the change can be suppressed.
BGP Update Generation and Update Groups
The BGP Update Groups feature separates BGP update generation from neighbor configuration. The BGP Update Groups feature introduces an algorithm that dynamically calculates BGP update group membership based on outbound routing policies. This feature does not require any configuration by the network operator. Update group-based message generation occurs automatically and independently.
BGP Update Group
When a change to the configuration occurs, the router automatically recalculates update group memberships and applies the changes.
For the best optimization of BGP update group generation, we recommend that the network operator keeps outbound routing policy the same for neighbors that have similar outbound policies. This feature contains commands for monitoring BGP update groups.
BGP Cost Community Reference
The cost community attribute is applied to internal routes by configuring the set extcommunity cost command in a route policy. The cost community set clause is configured with a cost community ID number (0–255) and cost community number (0–4294967295). The cost community number determines the preference for the path. The path with the lowest cost community number is preferred. Paths that are not specifically configured with the cost community number are assigned a default cost community number of 2147483647 (the midpoint between 0 and 4294967295) and evaluated by the best-path selection process accordingly. When two paths have been configured with the same cost community number, the path selection process prefers the path with the lowest cost community ID. The cost-extended community attribute is propagated to iBGP peers when extended community exchange is enabled.
The following commands include the route-policy keyword, which you can use to apply a route policy that is configured with the cost community set clause:
-
aggregate-address
-
redistribute
-
network
BGP Next Hop Reference
-
Next hop becomes unreachable
-
Next hop becomes reachable
-
Fully recursed IGP metric to the next hop changes
-
First hop IP address or first hop interface change
-
Next hop becomes connected
-
Next hop becomes unconnected
-
Next hop becomes a local address
-
Next hop becomes a nonlocal address
Note |
Reachability and recursed metric events trigger a best-path recalculation. |
-
Critical events are related to the reachability (reachable and unreachable), connectivity (connected and unconnected), and locality (local and nonlocal) of the next hops. Notifications for these events are not delayed.
-
Noncritical events include only the IGP metric changes. These events are sent at an interval of 3 seconds. A metric change event is batched and sent 3 seconds after the last one was sent.
BGP is notified when any of the following events occurs:
-
Next hop becomes unreachable
-
Next hop becomes reachable
-
Fully recursed IGP metric to the next hop changes
-
First hop IP address or first hop interface change
-
Next hop becomes connected
-
Next hop becomes unconnected
-
Next hop becomes a local address
-
Next hop becomes a nonlocal address
Note |
Reachability and recursed metric events trigger a best-path recalculation. |
The next-hop trigger delay for critical and noncritical events can be configured to specify a minimum batching interval for critical and noncritical events using the nexthop trigger-delay command. The trigger delay is address family dependent.
The BGP next-hop tracking feature allows you to specify that BGP routes are resolved using only next hops whose routes have the following characteristics:
-
To avoid the aggregate routes, the prefix length must be greater than a specified value.
-
The source protocol must be from a selected list, ensuring that BGP routes are not used to resolve next hops that could lead to oscillation.
This route policy filtering is possible because RIB identifies the source protocol of route that resolved a next hop as well as the mask length associated with the route. The nexthop route-policy command is used to specify the route-policy.
Next Hop as the IPv6 Address of Peering Interface
BGP can carry IPv6 prefixes over an IPv4 session. The next hop for the IPv6 prefixes can be set through a nexthop policy. In the event that the policy is not configured, the nexthops are set as the IPv6 address of the peering interface (IPv6 neighbor interface or IPv6 update source interface, if any one of the interfaces is configured).
If the nexthop policy is not configured and neither the IPv6 neighbor interface nor the IPv6 update source interface is configured, the next hop is the IPv4 mapped IPv6 address.
Scoped IPv4/VPNv4 Table Walk
To determine which address family to process, a next-hop notification is received by first de-referencing the gateway context associated with the next hop, then looking into the gateway context to determine which address families are using the gateway context. The IPv4 unicast and VPNv4 unicast address families share the same gateway context, because they are registered with the IPv4 unicast table in the RIB. As a result, both the global IPv4 unicast table and the VPNv4 table are is processed when an IPv4 unicast next-hop notification is received from the RIB. A mask is maintained in the next hop, indicating if whether the next hop belongs to IPv4 unicast or VPNv4 unicast, or both. This scoped table walk localizes the processing in the appropriate address family table.
Reordered Address Family Processing
The software walks address family tables based on the numeric value of the address family. When a next-hop notification batch is received, the order of address family processing is reordered to the following order:
-
IPv4 tunnel
-
VPNv4 unicast
-
VPNv6 unicast
-
IPv4 labeled unicast
-
IPv4 unicast
-
IPv4 MDT
-
IPv6 unicast
-
IPv6 labeled unicast
-
IPv4 tunnel
-
VPNv4 unicast
-
IPv4 unicast
-
IPv6 unicast
New Thread for Next-Hop Processing
The critical-event thread in the spkr process handles only next-hop, Bidirectional Forwarding Detection (BFD), and fast-external-failover (FEF) notifications. This critical-event thread ensures that BGP convergence is not adversely impacted by other events that may take a significant amount of time.
show, clear, and debug Commands
The show bgp nexthops command provides statistical information about next-hop notifications, the amount of time spent in processing those notifications, and details about each next hop registered with the RIB. The clear bgp nexthop performance-statistics command ensures that the cumulative statistics associated with the processing part of the next-hop show command can be cleared to help in monitoring. The clear bgp nexthop registration command performs an asynchronous registration of the next hop with the RIB.
The debug bgp nexthop command displays information on next-hop processing. The out keyword provides debug information only about BGP registration of next hops with RIB. The in keyword displays debug information about next-hop notifications received from RIB. The out keyword displays debug information about next-hop notifications sent to the RIB.
BGP Nonstop Routing Reference
BGP NSR provides nonstop routing during the following events:
-
Route processor switchover
-
Process crash or process failure of BGP or TCP
Note
BGP NSR is enabled by default. Use the nsr disable command to turn off BGP NSR. The no nsr disable command can also be used to turn BGP NSR back on if it has been disabled.
In case of process crash or process failure, NSR will be maintained only if nsr process-failures switchover command is configured. In the event of process failures of active instances, the nsr process-failures switchover configures failover as a recovery action and switches over to a standby route processor (RP) or a standby distributed route processor (DRP) thereby maintaining NSR. An example of the configuration command is RP/0/RSP0/CPU0:router(config) # nsr process-failures switchover
The nsr process-failures switchover command maintains both the NSR and BGP sessions in the event of a BGP or TCP process crash. Without this configuration, BGP neighbor sessions flap in case of a BGP or TCP process crash. This configuration does not help if the BGP or TCP process is restarted in which case the BGP neighbors are expected to flap.
During route processor switchover and In-Service System Upgrade (ISSU), NSR is achieved by stateful switchover (SSO) of both TCP and BGP.
NSR does not force any software upgrades on other routers in the network, and peer routers are not required to support NSR.
When a route processor switchover occurs due to a fault, the TCP connections and the BGP sessions are migrated transparently to the standby route processor, and the standby route processor becomes active. The existing protocol state is maintained on the standby route processor when it becomes active, and the protocol state does not need to be refreshed by peers.
Events such as soft reconfiguration and policy modifications can trigger the BGP internal state to change. To ensure state consistency between active and standby BGP processes during such events, the concept of post-it is introduced that act as synchronization points.
BGP NSR provides the following features:
-
NSR-related alarms and notifications
-
Configured and operational NSR states are tracked separately
-
NSR statistics collection
-
NSR statistics display using show commands
-
XML schema support
-
Auditing mechanisms to verify state synchronization between active and standby instances
-
CLI commands to enable and disable NSR
BGP Route Reflectors Reference
Figure 1 illustrates a simple iBGP configuration with three iBGP speakers (routers A, B, and C). Without route reflectors, when Router A receives a route from an external neighbor, it must advertise it to both routers B and C. Routers B and C do not readvertise the iBGP learned route to other iBGP speakers because the routers do not pass on routes learned from internal neighbors to other internal neighbors, thus preventing a routing information loop.
With route reflectors, all iBGP speakers need not be fully meshed because there is a method to pass learned routes to neighbors. In this model, an iBGP peer is configured to be a route reflector responsible for passing iBGP learned routes to a set of iBGP neighbors. In Figure 2 , Router B is configured as a route reflector. When the route reflector receives routes advertised from Router A, it advertises them to Router C, and vice versa. This scheme eliminates the need for the iBGP session between routers A and C.
The internal peers of the route reflector are divided into two groups: client peers and all other routers in the autonomous system (nonclient peers). A route reflector reflects routes between these two groups. The route reflector and its client peers form a cluster. The nonclient peers must be fully meshed with each other, but the client peers need not be fully meshed. The clients in the cluster do not communicate with iBGP speakers outside their cluster.
Figure 3 illustrates a more complex route reflector scheme. Router A is the route reflector in a cluster with routers B, C, and D. Routers E, F, and G are fully meshed, nonclient routers.
When the route reflector receives an advertised route, depending on the neighbor, it takes the following actions:
-
A route from an external BGP speaker is advertised to all clients and nonclient peers.
-
A route from a nonclient peer is advertised to all clients.
-
A route from a client is advertised to all clients and nonclient peers. Hence, the clients need not be fully meshed.
Along with route reflector-aware BGP speakers, it is possible to have BGP speakers that do not understand the concept of route reflectors. They can be members of either client or nonclient groups, allowing an easy and gradual migration from the old BGP model to the route reflector model. Initially, you could create a single cluster with a route reflector and a few clients. All other iBGP speakers could be nonclient peers to the route reflector and then more clusters could be created gradually.
An autonomous system can have multiple route reflectors. A route reflector treats other route reflectors just like other iBGP speakers. A route reflector can be configured to have other route reflectors in a client group or nonclient group. In a simple configuration, the backbone could be divided into many clusters. Each route reflector would be configured with other route reflectors as nonclient peers (thus, all route reflectors are fully meshed). The clients are configured to maintain iBGP sessions with only the route reflector in their cluster.
Usually, a cluster of clients has a single route reflector. In that case, the cluster is identified by the router ID of the route reflector. To increase redundancy and avoid a single point of failure, a cluster might have more than one route reflector. In this case, all route reflectors in the cluster must be configured with the cluster ID so that a route reflector can recognize updates from route reflectors in the same cluster. All route reflectors serving a cluster should be fully meshed and all of them should have identical sets of client and nonclient peers.
By default, the clients of a route reflector are not required to be fully meshed and the routes from a client are reflected to other clients. However, if the clients are fully meshed, the route reflector need not reflect routes to clients.
As the iBGP learned routes are reflected, routing information may loop. The route reflector model has the following mechanisms to avoid routing loops:
-
Originator ID is an optional, nontransitive BGP attribute. It is a 4-byte attributed created by a route reflector. The attribute carries the router ID of the originator of the route in the local autonomous system. Therefore, if a misconfiguration causes routing information to come back to the originator, the information is ignored.
-
Cluster-list is an optional, nontransitive BGP attribute. It is a sequence of cluster IDs that the route has passed. When a route reflector reflects a route from its clients to nonclient peers, and vice versa, it appends the local cluster ID to the cluster-list. If the cluster-list is empty, a new cluster-list is created. Using this attribute, a route reflector can identify if routing information is looped back to the same cluster due to misconfiguration. If the local cluster ID is found in the cluster-list, the advertisement is ignored.
iBGP Multipath Load Sharing Reference
When there are multiple border BGP routers having reachability information heard over eBGP, if no local policy is applied, the border routers will choose their eBGP paths as best. They advertise that bestpath inside the ISP network. For a core router, there can be multiple paths to the same destination, but it will select only one path as best and use that path for forwarding. iBGP multipath load sharing adds the ability to enable load sharing among multiple equi-distant paths. Configuring multiple iBGP best paths enables a router to evenly share the traffic destined for a particular site. The iBGP Multipath Load Sharing feature functions similarly in a Multiprotocol Label Switching (MPLS) Virtual Private Network (VPN) with a service provider backbone.
For multiple paths to the same destination to be considered as multipaths, the following criteria must be met:
-
All attributes must be the same. The attributes include weight, local preference, autonomous system path (entire attribute and not just length), origin code, Multi Exit Discriminator (MED), and Interior Gateway Protocol (iGP) distance.
-
The next hop router for each multipath must be different.
Note |
|
L3VPN iBGP PE-CE Reference
When BGP is used as the provider edge (PE) or the customer edge (CE) routing protocol, the peering sessions are configured as external peering between the VPN provider autonomous system (AS) and the customer network autonomous system. The L3VPN iBGP PE-CE feature enables the PE and CE devices to exchange Border Gateway Protocol (BGP) routing information by peering as internal Border Gateway Protocol (iBGP) instead of the widely-used external BGP peering between the PE and the CE. This mechanism applies at each PE device where a VRF-based CE is configured as iBGP. This eliminates the need for service providers (SPs) to configure autonomous system override for the CE. With this feature enabled, there is no need to configure the virtual private network (VPN) sites using different autonomous systems.
The neighbor internal-vpn-client command enables PE devices to make an entire VPN cloud act as an internal VPN client to the CE devices. These CE devices are connected internally to the VPN cloud through the iBGP PE-CE connection inside the VRF. After this connection is established, the PE device encapsulates the CE-learned path into an attribute called ATTR_SET and carries it in the iBGP-sourced path throughout the VPN core to the remote PE device. At the remote PE device, this attribute is assigned with individual attributes and the source CE path is extracted and sent to the remote CE devices.
+------------------------------+
| Attr Flags (O|T) Code = 128 |
+------------------------------+
| Attr. Length (1 or 2 octets) |
+------------------------------+
| Origin AS (4 octets) |
+------------------------------+
| Path attributes (variable) |
+------------------------------+
Origin AS is the AS of the VPN customer for which the ATTR_SET is generated. The minimum length of ATTR_SET is four bytes and the maximum is the maximum supported for a path attribute after taking into consideration the mandatory fields and attributes in the BGP update message. It is recommended that the maximum length is limited to 3500 bytes. ATTR_SET must not contain the following attributes: MP_REACH, MP_UNREACH, NEW_AS_PATH, NEW_AGGR, NEXT_HOP and ATTR_SET itself (ATTR_SET inside ATTR_SET). If these attributes are found inside the ATTR_SET, the ATTR_SET is considered invalid and the corresponding error handling mechanism is invoked.
MPLS VPN Carrier Supporting Carrier
Carrier supporting carrier (CSC) is a term used to describe a situation in which one service provider allows another service provider to use a segment of its backbone network. The service provider that provides the segment of the backbone network to the other provider is called the backbone carrier. The service provider that uses the segment of the backbone network is called the customer carrier.
A backbone carrier offers Border Gateway Protocol and Multiprotocol Label Switching (BGP/MPLS) VPN services. The customer carrier can be either:
-
An Internet service provider (ISP) (By definition, an ISP does not provide VPN service.)
-
A BGP/MPLS VPN service provider
You can configure a CSC network to enable BGP to transport routes and MPLS labels between the backbone carrier provider edge (PE) routers and the customer carrier customer edge (CE) routers using multiple paths. The benefits of using BGP to distribute IPv4 routes and MPLS label routes are:
-
BGP takes the place of an Interior Gateway Protocol (IGP) and Label Distribution Protocol (LDP) in a VPN routing and forwarding (VRF) table. You can use BGP to distribute routes and MPLS labels. Using a single protocol instead of two simplifies the configuration and troubleshooting.
-
BGP is the preferred routing protocol for connecting two ISPs, mainly because of its routing policies and ability to scale. ISPs commonly use BGP between two providers. This feature enables those ISPs to use BGP.
For detailed information on configuring MPLS VPN CSC with BGP, see the Implementing MPLS Layer 3 VPNs on module of the MPLS Configuration Guide.
Per VRF and Per CE Label for IPv6 Provider Edge
The per VRF and per CE label for IPv6 feature makes it possible to save label space by allocating labels per default VRF or per CE nexthop.
All IPv6 Provider Edge (6PE) labels are allocated per prefix by default. Each prefix that belongs to a VRF instance is advertised with a single label, causing an additional lookup to be performed in the VRF forwarding table to determine the customer edge (CE) next hop for the packet.
However, use the label-allocation-mode command with the per-ce keyword or the per-vrf keyword to avoid the additional lookup on the PE router and conserve label space.
Use per-ce keyword to specify that the same label be used for all the routes advertised from a unique customer edge (CE) peer router. Use the per-vrf keyword to specify that the same label be used for all the routes advertised from a unique VRF.
IPv6 Unicast Routing
Cisco provides complete Internet Protocol Version 6 (IPv6) unicast capability.
An IPv6 unicast address is an identifier for a single interface, on a single node. A packet that is sent to a unicast address is delivered to the interface identified by that address. Cisco IOS XR software supports the following IPv6 unicast address types:
-
Global aggregatable address
-
Site-local address
-
Link-local address
-
IPv4-compatible IPv6 address
For more information on IPv6 unicast addressing, refer the IP Addresses and Services Configuration Guide.
Remove and Replace Private AS Numbers from AS Path in BGP
Private autonomous system numbers (ASNs) are used by Internet Service Providers (ISPs) and customer networks to conserve globally unique AS numbers. Private AS numbers cannot be used to access the global Internet because they are not unique. AS numbers appear in eBGP AS paths in routing updates. Removing private ASNs from the AS path is necessary if you have been using private ASNs and you want to access the global Internet.
Public AS numbers are assigned by InterNIC and are globally unique. They range from 1 to 64511. Private AS numbers are used to conserve globally unique AS numbers, and they range from 64512 to 65535. Private AS numbers cannot be leaked to a global BGP routing table because they are not unique, and BGP best path calculations require unique AS numbers. Therefore, it might be necessary to remove private AS numbers from an AS path before the routes are propagated to a BGP peer.
External BGP (eBGP) requires that globally unique AS numbers be used when routing to the global Internet. Using private AS numbers (which are not unique) would prevent access to the global Internet. The remove and replace private AS Numbers from AS Path in BGP feature allows routers that belong to a private AS to access the global Internet. A network administrator configures the routers to remove private AS numbers from the AS path contained in outgoing update messages and optionally, to replace those numbers with the ASN of the local router, so that the AS Path length remains unchanged.
The ability to remove and replace private AS numbers from the AS Path is implemented in the following ways:
-
The remove-private-as command removes private AS numbers from the AS path even if the path contains both public and private ASNs.
-
The remove-private-as command removes private AS numbers even if the AS path contains only private AS numbers. There is no likelihood of a 0-length AS path because this command can be applied to eBGP peers only, in which case the AS number of the local router is appended to the AS path.
-
The remove-private-as command removes private AS numbers even if the private ASNs appear before the confederation segments in the AS path.
-
The replace-as command replaces the private AS numbers being removed from the path with the local AS number, thereby retaining the same AS path length.
The feature can be applied to neighbors per address family (address family configuration mode). Therefore, you can apply the feature for a neighbor in one address family and not on another, affecting update messages on the outbound side for only the address family for which the feature is configured.
Use show bgp neighbors and show bgp update-group commands to verify that the that private AS numbers were removed or replaced.
BGP Update Message Error Handling
The BGP UPDATE message error handling changes BGP behavior in handling error UPDATE messages to avoid session reset. Based on the approach described in IETF IDR I-D:draft-ietf-idr-error-handling, the Cisco IOS XR BGP UPDATE Message Error handling implementation classifies BGP update errors into various categories based on factors such as, severity, likelihood of occurrence of UPDATE errors, or type of attributes. Errors encountered in each category are handled according to the draft. Session reset will be avoided as much as possible during the error handling process. Error handling for some of the categories are controlled by configuration commands to enable or disable the default behavior.
According to the base BGP specification, a BGP speaker that receives an UPDATE message containing a malformed attribute is required to reset the session over which the offending attribute was received. This behavior is undesirable as a session reset would impact not only routes with the offending attribute, but also other valid routes exchanged over the session.
BGP Error Handling and Attribute Filtering Syslog Messages
When a router receives a malformed update packet, an ios_msg of type ROUTING-BGP-3-MALFORM_UPDATE is printed on the console. This is rate limited to 1 message per minute across all neighbors. For malformed packets that result in actions "Discard Attribute" (A5) or "Local Repair" (A6), the ios_msg is printed only once per neighbor per action. This is irrespective of the number of malformed updates received since the neighbor last reached an "Established" state.
%ROUTING-BGP-3-MALFORM_UPDATE : Malformed UPDATE message received from neighbor 13.0.3.50 - message length 90 bytes,
error flags 0x00000840, action taken "TreatAsWithdraw".
Error details: "Error 0x00000800, Field "Attr-missing", Attribute 1 (Flags 0x00, Length 0), Data []"
[4843.46]RP/0/0/CPU0:Aug 21 17:06:17.919 : bgp[1037]: %ROUTING-BGP-5-UPDATE_FILTERED :
One or more attributes were filtered from UPDATE message received from neighbor 40.0.101.1 - message length 173 bytes,
action taken "DiscardAttr".
Filtering details: "Attribute 16 (Flags 0xc0): Action "DiscardAttr"". NLRIs: [IPv4 Unicast] 88.2.0.0/17
[391.01]RP/0/0/CPU0:Aug 20 19:41:29.243 : bgp[1037]: %ROUTING-BGP-5-UPDATE_FILTERED :
One or more attributes were filtered from UPDATE message received from neighbor 40.0.101.1 - message length 166 bytes,
action taken "TreatAsWdr".
Filtering details: "Attribute 4 (Flags 0xc0): Action "TreatAsWdr"". NLRIs: [IPv4 Unicast] 88.2.0.0/17
BGP-RIB Feedback Mechanism for Update Generation
The Border Gateway Protocol-Routing Information Base (BGP-RIB) feedback mechanism for update generation feature avoids premature route advertisements and subsequent packet loss in a network. This mechanism ensures that routes are installed locally, before they are advertised to a neighbor.
BGP waits for feedback from RIB indicating that the routes that BGP installed in RIB are installed in forwarding information base (FIB) before BGP sends out updates to the neighbors. RIB uses the the BCDL feedback mechanism to determine which version of the routes have been consumed by FIB, and updates the BGP with that version. BGP will send out updates of only those routes that have versions up to the version that FIB has installed. This selective update ensures that BGP does not send out premature updates resulting in attracting traffic even before the data plane is programmed after router reload, LC OIR, or flap of a link where an alternate path is made available.
To configure BGP to wait for feedback from RIB indicating that the routes that BGP installed in RIB are installed in FIB, before BGP sends out updates to neighbors, use the update wait-install command in router address-family IPv4 or router address-family VPNv4 configuration mode. The show bgp , show bgp neighbors , and show bgp process performance-statistics commands display the information from update wait-install configuration.
Use-defined Martian Check
The solution allows disabling the Martian check for these IP address prefixes:
-
IPv4 address prefixes
-
0.0.0.0/8
-
127.0.0.0/8
-
224.0.0.0/4
-
-
IPv6 address prefixes
-
::
-
::0002 - ::ffff
-
::ffff:a.b.c.d
-
fe80:xxxx
-
ffxx:xxxx
-