One or both failover
partners could potentially move into COMMUNICATIONS-INTERRUPTED state. They
cannot issue duplicate addresses while in this state. However, having a server
in this state over longer periods is not a good idea, because there are
restrictions on what a server can do. The main server cannot reallocate expired
leases and the backup server can run out of addresses from its pool.
COMMUNICATIONS-INTERRUPTED state was designed for servers to easily survive
transient communication failures of a few minutes to a few days. A server might
function effectively in this state for only a short time, depending on the
client arrival and departure rate. After that, it would be better to move a
server into PARTNER-DOWN state so it can completely take over the lease
functions until the servers resynchronize.
There are two ways a
server can move into PARTNER-DOWN state:
- User
action —An administrator sets a server into
PARTNER-DOWN state based on an accurate assessment of reality. The failover
protocol handles this correctly. Never set both partners to PARTNER-DOWN.
- Failover
safe
period
expires —When the servers run unattended for longer
periods, they need an automatic way to enter PARTNER-DOWN state.
Network operators
might not sense in time that a server is down or uncommunicative. Hence, the
failover safe period, which provides network operators some time to react to a
server moving into COMMUNICATIONS-INTERRUPTED state. During the safe period,
the only requirement is that the operators determine that both servers are
still running, and if so, fix the network communications failure or take one of
the servers down before the safe period expires.
The length of the
safe period is installation-specific, and depends on the number of unallocated
addresses in the pool and the expected arrival rate of previously unknown
clients requiring addresses.
In Cisco Prime Network Registrar, the use-safe-period attribute is enabled by default for a failover pair and the default safe period is 4 hours. This ensures that if the failover
partner is in COMMUNICATIONS-INTERRUPTED state for 4 hours, it will enter PARTNER-DOWN state automatically after the safe
period elapses. You may need to review if this setting is appropriate for your network and adjust the safe-period based on
your network requirements.
In addition, during this safe period, either server allows renewals from any existing client, but there is a major risk of
possibly issuing duplicate addresses. This is because one server can suddenly enter PARTNER-DOWN state while the other is
still operating. In order to prevent this problem, it is important that you change the default settings for use-safe-period or put operational procedures in place to alert operations personnel when the failover pair loses contact with each other.
Especially, in the event of network communications failure, operator intervention is required before the safe period elapses.
Either one failover server needs to be taken offline or the use-safe-period attribute needs to be disabled on both the servers before the safe period elapses.
Note |
In Cisco Prime Network Registrar, use-safe-period is enabled by default. You may want to review if this is appropriate for your network and you may want to disable the use-safe-period or adjust the safe-period based on your network requirements and monitoring.
|
The number of extra
addresses required for the safe period should be the same as the expected total
of new clients a server encounters. This depends on the arrival rate of new
clients, not the total outstanding leases. Even if you can only afford a short
safe period, because of a shortage of addresses or a high arrival rate of new
clients, you can benefit substantially by allowing DHCP to ride through minor
problems that are fixable in an hour. There is minimum chance of duplicate
address allocation, and reintegration after the solved failure is automatic and
requires no operator intervention.
If the failover safe period length is more than the length of the MCLT and the failover server enters into PARTNER-DOWN state
because of the safe-period, the server can start allocating its partner other-available leases to DHCP clients immediately.
The advantage of this is that the server has additional leases to allocate. However, the disadvantage is that operator intervention
is required within the safe period in case of network communications failure. Either the failover server needs to be taken
offline or the use-safe-period attribute needs to be disabled on both the servers before the safe period elapses. Without operator intervention, both failover
servers will transition to PARTNER-DOWN state and start allocating its partner addresses to new DHCP clients.
Here are some
guidelines to follow, to help you decide whether to use manual intervention or
the safe period for transitioning to PARTNER-DOWN state:
-
If your corporate policy is to have minimal manual intervention, set the safe period. Enable the failover pair attribute
use-safe-period to enable the safe period. Then, set the DHCP attribute safe-period to set the duration (4 hours by default). Set this duration long enough so that operations personnel can explore the cause
of the communication failure and assure that the partner is truly down.
-
If your corporate policy is to avoid conflict under any circumstances, then never let either server go into PARTNER-DOWN state
unless by explicit command. Allocate sufficient addresses to the backup server so that it can handle new client arrivals during
periods when there is no administrative coverage. You can set PARTNER-DOWN on the Manage Failover Servers tab of the web UI,
if the partner is in the Communications-interrupted failover state, you can click Set
Partner
Down in association with an input field for the PARTNER-DOWN date setting. This setting is initialized to the value of the start-of-communications-interrupted attribute. (In Normal web UI mode, you cannot set this date to be an earlier value than the initialized date. In Expert web
UI mode, you can set this value to any date.)
Use failover-pair
name
setPartnerDown date in the CLI, specifying the name of the partner server. This moves all the scopes running failover with the partner into
PARTNER-DOWN state immediately, unless you specify a date and time with the command. This date and time should be when the
partner was last known to be operational.
If you use setPartnerDown in the CLI and specify the date and time when the partner was last known to be operational then the failover server calculates
the MCLT from the time specified in the setPartnerDown command. If the date and time is not specified for the setPartnerDown
command, then the failover server calculates the MCLT from the time the failover server moved to the COMMUNICATIONS-INTERRUPTED
state. In case of network communications failure, it is important that you specify the actual time the partner was last known
to be operational in the setPartnerDown command. Otherwise, it can result in duplicate IP addresses.
There are two conventions for specifying the date:
-
–num unit (a time in the past), where num is a decimal number and unit is s, m, h, d, or w for seconds, minutes, hours, days or weeks respectively. For example, specify -3d for three days.
-
Month (name or its first three letters), day, hour (24-hour convention), year (fully specified year or last two digits). This
example notifies the backup server that its main server went down at 12 o’clock midnight on October 31, 2002:
nrcmd> failover-pair dhcp2.example.com. setPartnerDown -3d
nrcmd> failover-pair dhcp2.example.com. setPartnerDown Oct 31 00:00:00 2001
Note |
Wherever you specify a date and time in the CLI, enter the time that is local to the nrcmd process. If the server is running in a different time zone than this process, disregard the time zone where the server is
running and use local time instead.
|