Troubleshooting Backup - Oracle RAC iDataAgent

Table of Contents

Completed with one or more errors

Backup jobs from Oracle RAC iDataAgent will be displayed as "Completed w/ one or more errors" in the Job History in the following cases:

  • When RMAN Script execution for the backup job completes with warnings.
  • When job is killed after backing up some data.
  • During offline backups, if the database cannot be opened after a backup.

RAC0012: Instance Creation Error When You Use Policy Managed Oracle RAC Creation

Symptom

The following error is generated when you use policy managed Oracle RAC creation

RAC Instance add operation successful. Unable to establish connectivity with the instance properties for []. [Invalid Oracle RAC instance name. Please enter the correct RAC instance name.].

Resolution

There is no action required. You can continue to perform backups.

Oracle Errors

If you receive an Oracle error during an Oracle backup operation, we recommend that you follow procedures published by Oracle Corporation on resolving the specific error. We also advise you to consult with your on-site Oracle database administrator, as needed.

RAC0001: Defining Oracle RAC client connections

Symptom

Oracle RAC iDataAgent backups fail due to client connection mismatch.

Resolution

When configuring the Oracle RAC iDataAgent, make sure to use a dedicated TNS or SCAN connection for each node in the Oracle RAC iDataAgent.

For example, consider a 3 node RAC, where the DB is called RACDB, and the instance on each node is RACDB1, RACDB2, and RACDB3 on nodes 1 through 3 respectively. There should be a dedicated TNS service for each instance on each node, that will guarantee a consistent connection to the instance on each node.

The following entries should be available in the tnsnames.ora configuration file on all the nodes.

RACDB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = lx64rac-cluster)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = racdb)
)
)
RACDB1 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = lx64rac-1-vip)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = racdb)
(INSTANCE_NAME = racdb1)
)
)
RACDB2 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = lx64rac-2-vip)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = racdb)
(INSTANCE_NAME = racdb2)
)
)
RACDB3 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = lx64rac-3-vip)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = racdb)
(INSTANCE_NAME = racdb3)
)
)

In the entries above, the RAC database is called RACDB, and the instances are on Linux servers where node1 = lx64rac-1, node2 = lx64rac-2, and node3 = lx64rac-3. The RAC instances per node are RACDB1 through RACDB3 respectively.

When configuring the nodes from the CommCell Console, based on the above tnsnames.ora file, the following connect strings would be used:

sys/oracle@racdb1 for connecting to the lx64rac-1 node
sys/oracle@racdb2 for connecting to the lx64rac-2 node
sys/oracle@racdb3 for connecting to the lx64rac-3 node

Syntax for TNS connect

For "sys/oracle@racdb2"

sys = oracle account

oracle = password for sys account

racdb2 = the instance name on node lx64rac-2

For Oracle 11gR2 and above, there is a new SCAN connect which is also supported by the Oracle RAC iDataAgent. When scan connections are used, the connect strings would appear as follows:

sys/oracle@lx64rac-1:1521/racdb for connecting to the lx64rac-1 node
sys/oracle@lx64rac-2:1521/racdb for connecting to the lx64rac-2 node
sys/oracle@lx64rac-3:1521/racdb for connecting to the lx64rac-3 node

Syntax for SCAN connect

For "sys/oracle@lx64rac-2:1521/racdb"

sys = oracle account

oracle = password for sys account

lx64rac-2:1521 = node and port of listener on node

racdb = RAC database service name

These connection settings are crucial for the backup to work correctly.

During the discovery phase of the backup, the iDataAgent makes sure that when it connects to a specific instance, it runs a query to confirm that when the connection is made, that it connects to the specific instance on the correct node.

Instead of providing a dedicated service name, if the racdb service name was used, then the listener may establish the connection to any instance/node in the RAC. This becomes a problem since the MediaAgent will be expecting a pipeline connection from a predetermined node (say node1) but the listener based on a round-robin selection, winds up connecting to a different node (node 2 or node 3).

When this mismatched listener to pipeline condition occurs the data pipe will fail to connect causing the backup to also fail.

Here is an example of an RMAN session that gets generated with a Oracle RAC iDataAgent backup (using 2 nodes):

Rman Script:
[CONFIGURE CONTROLFILE AUTOBACKUP ON;
run {
allocate channel ch1 type 'sbt_tape' connect sys/******@lx64rac-1:1521/racdb
PARMS="SBT_LIBRARY=/opt/simpana/Base/libobk.so,BLKSIZE=1048576,ENV=(CV_mmsApiVsn=2,CV_channelPar=ch1,ThreadCommandLine=BACKUP -jm 45 -a 2:299 -cl 61 -ins 78 -at 80 -j 21200 -jt 21200:4:1 -bal 2 -rcp 0 -ms 2 -data -ma 16 -chg 1:1 -rac 1 -cn lx64rac-1 -vm Instance001)"
TRACE 0;
allocate channel ch2 type 'sbt_tape' connect sys/******@lx64rac-2:1521/racdb
PARMS="SBT_LIBRARY=/opt/simpana/Base/libobk.so,BLKSIZE=1048576,ENV=(CV_mmsApiVsn=2,CV_channelPar=ch2,ThreadCommandLine=BACKUP -jm 45 -a 2:299 -cl 61 -ins 78 -at 80 -j 21200 -jt 21200:4:1 -bal 2 -rcp 0 -ms 2 -data -ma 16 -hn lx64rac-1.unixdb.lab -chg 2:1 -rac 2 -cn lx64rac-2 -vm Instance001)"
TRACE 0;
setlimit channel ch1 maxopenfiles 8;
setlimit channel ch2 maxopenfiles 8;
backup
incremental level = 1
filesperset = 4
database
include current controlfile spfile ;
}
exit;
]

In the allocate command there is a "connect" statement that directs RMAN / Oracle to connect to a specific node in the RAC.

The example shown above is using a scan connect. Essentially what happens is that the Oracle RAC iDataAgent starts an RMAN session on the first node listed in the storage tab of the subclient. It then sends an RMAN script/session similar to the one shown above.

Also note in the allocate command, the different nodes can be seen in the parameters passed in from the backup process. This indicates which node the MediaAgent needs to connect a pipeline. If the connect attaches to the wrong node, the SBT layer will fail to connect to the data pipe, as the MediaAgent will be expecting the connection from a specific node as directed in the ENV (environment) of the allocate command.

For this same reason a "/" cannot be used when defining/configuring nodes into a Oracle RAC iDataAgent. This connection requires a network based connection and a "/" connect is a local only connection.

Backup Failures

RAC0002: Job fails due to sbtio.log size

Issue

Sometimes, jobs fail due to increase in the size of sbtio.log file in the $UDUMP directory.

Resolution

To resolve this, set the size limit for the sbtio.log file using the sMAXORASBTIOLOGFILESIZE registry key. Once the specified size limit is reached, the sbtio.log file gets pruned automatically.

RAC0003: Command line backup fails

Issue

Command Line backups fail.

Resolution

  • Make sure if the required media resource is available and then run the backups once again.
  • For on demand backups, you can run more than one script for an instance. However, backup jobs will fail if there are more than one instance in the argument file.
  • For Oracle on Windows, it is recommended to avoid using a space after a comma in the argument file. A backup job will fail if you leave a space after a comma in the argument file.
  • RMAN command line backup fails with the following error

    "Unable to open lock file /opt/snapprotect/Base/Temp/locks/.dir_lock: Permission denied"

    This may occur if the umask parameter is set as 022 in the .profile file for the Oracle instance. As a workaround, change the umask to 000 or 002 and try the backup again.

RAC0004: Command line backup fails for large backups

Issue

Sometimes, the third party command line jobs may hang when you perform large backups and restores.

Resolution

This happens since ClDBControlAgent updates the job manager for every 100MB data transfer and this causes the thread failure for large backups/ restores after transferring some of the data.

The following exception will be seen in the ClDBControlAgent.log:

5710030 304 02/22 03:47:23 608119 OraAgentBase::NotifyCommServeJobContinue() - m_jobObject->setUnCompBytesToAdd(105119744) ...
5710030 304 02/22 03:47:24 608119 CvThread::start_func() - Unhandled exception.
5710030 405 02/22 03:47:37 608119 ClOraControlAgent::OnClientTimeout() - Got timed out while waiting for msg from client 0

You can set sBYTESDIFFMBS registry key <value> in MBs in OracleAgent/.properties.

This will update the job manager at every <value> in MBs specified in the key.

RAC0005: Offline backup with lights out script fails

Issue

Offline backup using lights out script fails with the following error:

RMAN error "ORA-12528 TNS listener - all appropriate instances are blocking new connections

Resolution

As a workaround, add a reference to the database in the listener.ora file as shown in the example below:

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = C:\oracle\product\10.1.0\db_1)
(PROGRAM = extproc)
)
(SID_DESC =
(SID_NAME = rman10g)
(ORACLE_HOME = C:\oracle\product\10.1.0\db_1)
(SID = rman10g)
)
)

Oracle offline backup with lights out option fails when you use the default value for retry attempts for the subclient. As a workaround, increase the retry attempts by setting the Tries number value greater than or equal to 5. See step 3d in Configuring an Offline Subclient for more details.

RAC0006: Backup timeout failure

Issue

The backup fails because of a timeout.

Resolution

The default time for resources to allocate streams during RMAN command line backups is 86400 seconds (i.e., 24 hours). If a backup fails due to a timeout being reached, you can configure the sALLOCATESTREAMSECS registry key to increase the waiting time period.

RAC0007: Backup fails intermittently on Linux clients

Issue

On Linux clients, if the libobk.so library fails to load, the backups may fail.

Resolution

As a workaround, do the following steps:

  1. Log in to the Oracle client computer as root.
  2. From the system prompt, enter the following command:

    ldconfig /<Base_directory_name>

    For example: # ldconfig <software installation path>/Base

This will ensure that the libobk.so library is loaded so that backups for Oracle on Linux can run successfully.

RAC0008: Backup fails on Windows clients

Issue

The backup fails on Windows Clients.

Resolution

Make sure that the Oracle user is part of administrator group. If the user is not part of administrator group, assign group permissions for the user as follows:
  1. From Windows Explorer, right-click SnapProtect folder and then select Properties.
  2. Click the Security tab.
  3. Select the user and click Edit.
  4. Click the Allow checkbox for Full Control permission for the user, and then click OK.
  5. From the Registry Editor, navigate to HKEY_LOCAL_MACHINE | SOFTWARE.
  6. Right click CommVault Systems and select Permissions...
  7. Select the user and click Allow checkbox for Full Control permission.

RAC0009: Log backup fails

Issue

If the Oracle database is configured to save the archive logs in the Flash recovery area, and Oracle subclients having both Protect backup recovery area and Archive Delete enabled at the same time then the backup will fail.

Resolution

To resolve this, there should be two different subclients, one for Protect backup recovery area and the other for Archive Delete.

Log backup fails if you select the default  USE_DB_RECOVERY_FILE_DEST entry as a log destination for the backup.

To resolve this, make sure that the log destinations are included in the PFile(init<SID>.ora) or SPFile (spfile.ora) file. Also ensure that the correct log destination is selected for the backup.

RAC0010: Database block corruption

Issue

The backup fails with the following error:

LISTING 2: r_20030520213618.log
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on d1 channel at 05/20/2003 21:36:26
ORA-19566: exceeded limit of 0 corrupt blocks for file
/u01/app/Oracle/oradata/MRP/sales_data_01.dbf

Resolution

Make sure that the maximum value for database block corruptions is set for the backup. It is recommended that you set this value to match the number of corrupted database blocks identified by RMAN for the database file being backed up.

RAC0011: Backup fails because of $ORACLE_HOME/sqlplus/admin/qlogin.sql

Issue

If the following line is present in the $ORACLE_HOME/sqlplus/admin/qlogin.sql file, it may cause the SrvOraAgent server process on the CommServe to fail when browsing database contents or executing a backup.

set linesize 80

Resolution

To avoid such failures, comment out that line from the file and re-try the browse or backup operation.

  • Backup fails with following error:

    Character conversion not supported

    By default, the iDataAgent sets the NLS_LANG environment variable to American_America.US7ASCII character set. However, if the Oracle database on the client uses a different NLS character set (eg., WE8MSWIN1252), theiDataAgent’s backup operations may fail.

    In such cases, use the <oracle_SID>_NLS _LANG additional setting to set the NLS_LANG environment variable to American_America.<database_character_set> on the client computer.