Introduction
This document describes a step by step procedure to resolve database replication or synchronization issue in Prime Network when standby database from Primary database is rebuilded.
Prerequisites
Requirements
Cisco recommends that you have knowledge of these topics:
- Use this procedure to rebuild secondary database only if SWITCHOVER_STATUS of primary database is into UNRESOLVABLE GAP.
- Ensure that the primary database is in READ WRITE and secondary database is in READ ONLY or READ ONLY WITH APPLY modes.
Components Used
The information in this document is based on these software and hardware versions:
- Prime Network release 3.9 and above
- Oracle Database 11G release
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
Background Information
1. Use this command to know switchover_status of Primary database:
SQL> select switchover_status from v$database;
SWITCHOVER_STATUS
--------------------
UNRESOLVABLE GAP
Note: Prime Central Geo HA switchover fails abruptly leaving the Prime Central GEO HA system and/or database role status corrupt (both primay or both standby) and then you need to rebuild either primary or secondary depending upon last active/standby status.
Note: For all other cases, open SR with Cisco TAC to resolve database replication issue.
2. Use this command to know the current mode of primary and secondary database:
SQL> select open_mode from v$database;
Problem
Prime Network Database Replication failure.
Prime Network application creates system events that notify such failures, which are availalbe in Event Vision GUI client.
Troubleshoot
Before the solution, perform basic trobleshooting steps like:
1. Check the network connectivity and/or latency related issues between Primary and Secondary Prime Network Gateway.
2. Check these database logs on Primary to find any database related ORA errors:
<database_home_directory>/diag/rdbms/anadb/anadb/trace/alert_anadb.log
3. Check open_mode, current_scn and switchover status on Primary and Secondary database.
SQL> select open_mode from v$database;
SQL> select current_scn from v$database;
SQL> select switchover_status from v$database;
4. Root cause for database replication can be due to network commuication issue between Primary and Secondary Prime Network Gateway, corrupt database or similar database related errors.
Solution
Execute Database Restore Procedure on Primary and Secondary Prime Network Gateway:
Step 1. The current scheduled backup jobs keep several days of archive log files in the file system. To avoid from the archive log files to be removed, this line in backup_daily.sh, backup_high_daily.sh, backup_weekly.sh and backup_high_weekly.sh is commented out:
Change delete noprompt archivelog until time ... to #delete noprompt archivelog until time ...
Note: These '.sh' scripts are owned by oracle user and can be found in $ORACLE_HOME/ana_scripts directory.
Step 2. On Primary database, log in as sysdba and find the number of redo files on the system by and execute this command:
SQL> select member from v$logfile;
For each log fine execute this command. So if the previous command returned 6 rows, then execute the next command 6 times.
SQL>alter system switch logfile;
Step 3. On Standby database, log in as sysdba and create pfile from spfile:
SQL>create pfile='$ORACLE_HOME/dbs/ana_sb_init.ora' from spfile;
Step 4. On Standby database log in as sysdba and find the directory path to the datafile, backup pieces, redo logs and archive log files. This can be done with these commands:
To find the datafiles:
SQL> select name from v$datafile;
To find the backup files:
rman target /
RMAN> list backup;
To find the redo log files:
SQL> select member from v$logfile;
To find the archiveLog:
SQL> show parameter log_archive_dest_1;
Shutdown the database:
sqlplus / as sysdba
SQL> shutdown immediate;
Step 5. Delete all datafile files, backup pieces, redo log files and archivelog files from the corresponding directories (path was found in Step 4.).
Then restart nomount with the pfile created in Step 3:
sqlplus / as sysdba
SQL>startup nomount pfile='$ORACLE_HOME/dbs/ana_sb_init.ora;
Step 6. On primary database make a copy of all original backup pieces in your backup folder and store them in other location.
Step 7. On primary database, connect to RMAN and use delete backup to remove all physical backup pieces from the file system.
#rman target /
RMAN>delete backup;
Step 8. On Primary database connect to RMAN and take a full backup of the database, Standby control file and archivelog in this order. Execute these commands:
#rman target /
RMAN>backup database;
RMAN>backup format '$BACKUP_DIR/Control%U' current controlfile for Standby;
RMAN>backup archivelog all;
Note: The $BACKUP_DIR is the current backup folder found with list backup previously and the file will be called Control%U in the future. It is not a variable.
Step 9. On Primary database connect to RMAN and use list backup to find out the ckp scn for the Standby control field that was created in Step 8. Look for the file with the name format of $BACKUP_DIR/Control%U.
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
2358 Full 1.09M DISK 00:00:04 21-JAN-14
BP Key: 2358 Status: AVAILABLE Compressed: YES Tag: TAG20140121T162311
Piece Name: /export/home/oracle/backup/Control9nouks3f_1_1
Standby Control File Included: Ckp SCN: 164541747 Ckp time: 21-JAN-14
Note: In this example, the Standby control file backup is /export/home/oracle/backup/Control9nouks3f_1_1. In the line below this file name, you see "Ckp SCN: 164541747". We will use the number "164541747" in the duplication run block in Step 13.
Step 10. On Primary database tar up all the backup pieces what are created in Step 8. As root user SCP the tar file to the backup folder at Standby database.
Step 11. On Standby database login as root user and use chown to change the file ownership of the tar file to oracle:dba. Then switch user back to oracle(su - oracle) and untar the tar file.
Step 12. On Primary gateway login as the prime network user and cd to ~/Main directory and run this command to get the sys password:
./runRegTool.sh –gs 127.0.0.1 get 127.0.0.1 persistency/general/EmbeddedDBSystemPass
Note: The returned sys password is used in the next step to connect to Standby database from the Primary database.
Step 13. On Primary database connect to target database (Primary) and then to auxiliary database (Standby). Then run duplicate run block to create the Standby database:
#rman target /
RMAN>connect auxiliary sys/$sys_pwd@ANADB_SB
RMAN>run {
set until scn $SCN_NUMBER;
duplicate target database
for Standby
dorecover
spfile
set "db_unique_name"="anadb_sb"
set LOG_ARCHIVE_DEST_2="Service=anadb ASYNC LGWR VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
database_unique_name=anadb"
set FAL_CLIENT="anadb_sb"
set FAL_SERVER="anadb"
set log_file_name_convert="$REDO","$REDO"
nofilenamecheck;
}
Note: $sys_pwd is the sys password you obtained in Step 12. The $SCN_NUMBER in the run block is obtained in Step 9. as an example. $REDO is the redo log location followed by /.
Step 14. Once the run block in Step 13. completes, then on Standby database login as sysdba and run these commands to bring up the Standby database in read-only mode followed by read-only with apply mode:
sqlplus / as sysdba
SQL>shutdown immediate;
SQL>startup nomount;
SQL>alter database mount Standby database;
SQL>recover managed Standby database using current logfile disconnect from session;
SQL>recover managed Standby database cancel;
SQL>alter database open read only;
SQL>recover managed Standby database using current logfile disconnect from session;
Step 15. After Verification on Primary database, uncomment this line in backup_daily.sh,backup_high_daily.sh, backup_weekly.sh and backup_high_weekly.sh:
Change #delete noprompt archivelog until time ... to delete noprompt archivelog until time ...
Verify
Database verification on Primary and Secondary Prime Network gateway:
1. Verify that the number and the names of redo log files are same on Primary and Standby database.
2. Verify that the number and the size of the datafiles on Primary and Standby database are same.
3. Use this command on both Primbary and Standby database to show that the current SCN on Standby database can catch up with the SCN on Primary database:
sqlplus / as sysdba
SQL>select current_scn from v$database;
4. Verify that the open_mode of Primary database is READ WRITE and READ ONLY WITH APPLY on Standby database.
sqlplus / as sysdba
SQL>select open_mode from v$database;
5. Verify that the switchover_status of primary is TO STANDBY and NOT ALLOWED on Standby database:
sqlplus / as sysdba
SQL>select switchover_status from v$database;
6. Validate that the archive logs are being transfered
On Primarydatabase:
SQL> alter system switch logfile;
On Secondary database:
Check to ensure that a new file is created in ~/arch.
7. Verify that you will not see Database replication failure in Event Vision GUI from now on (in the next 20 minutes).