Troubleshooting Backup - Oracle Agent

Table of Contents

Troubleshooting Performance Issues

If you experience performance issues during backup, you can troubleshoot them by enabling logging of performance details in the log files. These performance counters contain information that help in resolving the performance related issues during backups.

  1. Perform a client backup to determine the performance statistics. See Backing Up an Oracle Subclient.

    Track the progress of the job from the Job Controller window of the CommCell Console.

    • Right-click the backup job and click Details and verify the Data Transferred on Network.

      Tip: If a backup job uses 10 streams, back up at least 200 GB of data. If you are performing backups with 5 streams, back up at least 100 GB of data.

    • If the backup transfer rate is slow, then kill the job by right-clicking the backup job and then click Kill.
  2. View the log files of a backup job to verify performance counters. For more information, see View the Log Files of a Job History.

    Verify the following the performance counters for the log files:

    Total Oracle I/O Time

    Time spent per SBT thread for reading the data from disk.

    Total MA I/O Time

    Time spent during the data transfer to MediaAgent (the data read from the network buffer and written to the disk).
  3. In the log file verify the performance counters.

    If the Total Oracle I/O Time value is more than the Total MA I/O Time value then perform the following:

    • Verify the Oracle application compression. If it is ON turn OFF the compression.
    • Verify NetApp compression. If it is ON turn OFF the compression from instance and storage policy copy level. For more information see Setting Up Data Compression.
    • Depending upon your environment, modify Data Files per BFS (value to 4 or 8) and Max Open Files. For more information, see Performance Tuning.

    If the Total Oracle I/O Time value is less than the Total MA I/O Time value then perform the following :

    • If the write throughput of the disk is slow, run the CvDiskPerf tool to measure the throughput for the disk. See Disk Performance Tool for more information.
    • If the data transfer on the network is slow, or you have a low bandwidth network environment, then verify the network throughput by running the CvNetworkTestTool tool. If the network throughput is low then set the nNumPipelineBuffers additional setting to increase the data transfer throughput from the client. For more information, see Increasing Data Transfer Throughput From Client.

Completed with one or more errors

Backup jobs from the Oracle Agent will be displayed as "Completed w/ one or more errors" in the Job History in the following cases:

  • When RMAN Script execution for the backup job completes with warnings.
  • When the job is killed after backing up some data.
  • When the database cannot be opened after an offline backup.

Oracle Errors

If you receive an Oracle error during an Oracle backup operation, follow procedures published by Oracle Corporation on resolving the specific error and consult with your on-site Oracle database administrator, as needed.

ORCL0001: Error 18:18 Oracle database is not available. The database may be down or in an unknown state

Issue

Offline backups fail on Windows servers with the following error message:

ORACLE database is not available. The database may be down or in an unknown state

Resolution

This problem can happen if the TNS configuration for this database does not have a "static" listener. Most Oracle databases use dynamic listeners, and there could be a delay in the listener between the lights-out script that starts the database, and when the database finishes registering in the dynamic listener. If the database does not register quickly enough, the CommServe will fail to connect to the database with the TNS connection, which results in an unknown status and this error.

To resolve this issue, scan the ClOraAgent.log on the client for a connection error to Oracle where TNS is blocking connections.

If this is the error, add a directive to the tnsnames.ora file which will prevent the connection from getting blocked when using a dynamic listener.

Go to the %ORACLE_HOME%\NETWORK\admin folder on the Oracle server and add (UR = A) to the connect_data directive.

ORCL0002: Data aging jobs complete with errors during Oracle crosscheck

Issue

When data aging Oracle backup jobs, by default, the RMAN Oracle crosscheck operation is initiated. During this process, a check is done to see what Oracle backup objects can be aged off the backup media, according to the CommCell Data Retention rules. A remote Oracle crosscheck operation is run on the Oracle Server against the objects identified by the CommCell Storage Policy retention criteria as ready to be aged (marked as available to over-write).

When an Oracle backup piece is aged (marked as available to over-write) in the CommServe database, by default a RMAN command is issued to make the same backup piece 'unavailable' in the RMAN Control File (Catalog). This way the RMAN Control File is kept synchronized, so that Oracle knows which backup pieces are not available in the CommServe database.

This crosscheck operation may fail with any of the following errors:

Error Code 32:321: Failed to clean up database after Oracle Cross-Check.

Error Code 13:93: Failed to clean up database after Oracle cross-check due to database problem.

This may be due to various issues, but the most common is that the crosscheck operation runs out of time, taking longer than the default 600 seconds (10 Minutes) set.

Often there are too many objects that need to be processed by RMAN, to get this done within this default time allotted. Sometimes network issues trigger the problem. When the crosscheck fails, the unprocessed objects that are trying to age, build up a large list in the CommServe Database, making the time required during next data aging operation even longer.

Resolution

To resolve this issue, set the cross check timeout to 1800 seconds instead of the default timeout period of 600 seconds.

  1. From the CommCell Browser, navigate to Client Computers | client | Oracle .
  2. Right-click the instance and then click Properties.
  3. Click the Details tab.
  4. In the Cross Check Timeout box, set the time interval to 1800 seconds.
  5. Click OK.

ORCL0003: Error 18:40 RMAN script execution failed for this job.

Issue

Oracle backups fail with the following error message:

RMAN Script execution failed for this job. Please check RMAN log file for job failure reason.

The following entry is in the RMAN log file:

ORA-19511: Error received from media manager layer, error text:

Resolution

When the error message indicates an error from the Media Manager Layer, it could be a media error or network error. Fix this issue and then rerun the RMAN job.

If you still encounter the same issue, open a case with the contracted support center and upload the following logs for troubleshooting:

  • All logs from the CommServe
  • All logs from the MediaAgent
  • All logs from the Client
  • The RMAN script or backup script

Include the following logs:

  • JobManger.log from CommServe (log file is normally included with a send logs job for CS)
  • ORASBT.log from Client (log file is normally included with a send logs job for CL)
  • cvd.log from MediaAgent (log file is normally included with a send logs job for MA)
  • RMAN log files

    The RMAN log files are in the SnapProtect JobResults directory. For a Windows configuration (where C: represents the drive that the SnapProtect resides), the location is:

     C:\Program Files\CommVault Systems\Simpana\iDataAgent\jobResults\CV_JobResults\2\0\<job_id> \backup.out

    For a UNIX configuration, the location is:

    /opt/simpana/iDataAgent/jobResults/CV_JobResults/2/0/<job_id>/backup.out

Additional Information

In a UNIX configuration, you must set the SBT_LIBRARY parameter to the location of the libobk.xx library. For information on how to set the SBT library parameter, see Using PARMS in the Oracle Allocate Channel Command.

You can create an RMAN script to test the SBT library. On the RMAN command line run the following sample script, substituting any required or optional Oracle SBT parameters. For information on required and optional SBT parameters, see SBT Parameters.

run {
allocate channel ch1 type 'sbt_tape'
PARMS="BLKSIZE=262144,
SBT_LIBRARY=/opt/galaxy/Base/libobk.so, ENV=(CvClientName=client_name,CvInstanceName=Instance001)"
TRACE 2 DEBUG 2;
}

ORCL0004: Oracle bug with available patch - Backup and recovery impacted

Issue

Oracle has released a patch titled "High SCN growth rate from ALTER DATABASE BEGIN BACKUP in 11g" under patch number 12371955.

This information is relevant to SnapProtect conventional Oracle backups and for SnapProtect Oracle Online backups as the command ALTER DATABASE BEGIN BACKUP is issued when putting tablespaces into hot backup mode.

Resolution

Please review the issue and apply the patch to the CommCell Oracle clients.

The Oracle link requires Support Account to access:

"Bug 12371955 Hot Backup can cause increased SCN growth rate leading to ORA-600 [2252] errors" (Modified 16-FEB-2012 Type PATCH Status PUBLISHED)

https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=PATCH&id=12371955.8

ORCL0005: Log files required to troubleshoot Oracle Agent related errors

Issue

Log Files are require for backup troubleshooting.

Resolution

When you encounter errors during backup/restore, make sure to view the following logs for troubleshooting:

  • SrvOraAgent.log on the CommServe
  • CIOraAgent.log on the client
  • ORASBT.log on the client.

    This log file is required if you encounter errors during data transfer from/to the MediaAgent.

  • <job_results_directory>/2/0/<jobid>/<backup|restore>.out for RMAN specific errors on the client.

If the information in the log files is not sufficient enough to determine the failure reason, increase the debugging level in the EventManager/.properties file and re-run the job.

In addition, you can also view the RMAN logs from the Oracle Agent. From the Job Controller window or Job History window, right click the specific job and click View RMAN Log.

The RMAN logs are stored in the job results directory as backup.out (for backup jobs) and restore.out (for restore jobs). If there is an issue with data aging, the file in the job results directory will be called crosscheck.out.

ORCL0006: Performance statistics for Oracle backup and restore

The Oracle Agent logs the performance statistics in the ORASBT.log file with the following throughput information:

Backup

Oracle I/O Throughput: Amount of data read by Oracle from disk in GB /Hour
MediaAgent I/O Throughput: Amount of data written by the MediaAgent in GB / Hour
I/O Throughput Net amount of data read from Oracle and written to the MediaAgent (i.e. amount of data backed up) in GB / Hour

(This value includes time taken by Oracle as well as the MediaAgent.)

Restore

Oracle I/O Throughput: Amount of data written by Oracle to disk in GB / Hour
MediaAgent I/O Throughput: Amount of data read from the MediaAgent in GB / Hour
I/O Throughput: Effective amount of data read from the MediaAgent and written by Oracle (i.e. amount of data restored) in GB / Hour.

(This value includes time taken by Oracle as well as the MediaAgent.)

ORCL0007: Debugging allocate channel failures on UNIX clients

Use the following steps to troubleshoot allocate failure errors on UNIX clients:

Install/Permission issues

Oracle user should belong to the snapprotect group entered during the Oracle Agent install. Otherwise, Oracle will not able to write to the ORASBT.log and will not be able to access the CommVault registry /etc/CommVaultRegistry. More often, customers select 'sys' group (Oracle does not belong to this group and fails) at the time of Oracle Agent install. Please follow the installation instructions to create snapprotect group and reinstall the software packages on the client.

Library loading errors

Oracle backup library loading errors get logged into the temporary hook file libcvobk.log generated under /tmp folder. If this file does not exist, the Oracle user (not the root user) should create this file and run the backup again. Check this file for any library loading errors.

Create trace files:

Execute the following RMAN command and get the latest trace files udump directory of this Oracle instance. Also get the Agent client logs as well.

rman target userid/passwrod@instance nocatalog run { allocate channel ch1 type 'sbt_tape' trace=2 debug=2; } exit;

If the issue still exist, escalate to customer support with output of above steps.

If using a Windows 32bit client, make sure to check if /3GB switch is set in the boot.ini file.

ORCL0009: Job fails due to sbtio.log size

Issue

Sometimes, jobs fail due to increase in the size of sbtio.log file in the $UDUMP directory.

Resolution

To resolve this, set the size limit for the sbtio.log file using the sMAXORASBTIOLOGFILESIZE additional setting. Once the specified size limit is reached, the sbtio.log file gets pruned automatically.

ORCL0010: Command line backup fails

Issue

Command Line backups fail.

Resolution

  • Make sure if the required media resource is available and then run the backups once again.
  • For on demand backups, you can run more than one script for an instance. However, backup jobs will fail if there are more than one instance in the argument file.
  • For Oracle on Windows, it is recommended to avoid using a space after a comma in the argument file. A backup job will fail if you leave a space after a comma in the argument file.
  • RMAN command line backup fails with the following error

    "Unable to open lock file /opt/snapprotect/Base/Temp/locks/.dir_lock: Permission denied"

    This may occur if the umask parameter is set as 022 in the .profile file for the Oracle instance. As a workaround, change the umask to 000 or 002 and try the backup again.

ORCL0011: Command line backup fails for large backups

Issue

Sometimes, the third party command line jobs may hang when you perform large backups and restores.

Resolution

This happens since ClDBControlAgent updates the job manager for every 100MB data transfer and this causes the thread failure for large backups/ restores after transferring some of the data.

The following exception will be seen in the ClDBControlAgent.log:

5710030 304 02/22 03:47:23 608119 OraAgentBase::NotifyCommServeJobContinue() - m_jobObject->setUnCompBytesToAdd(105119744) ...
5710030 304 02/22 03:47:24 608119 CvThread::start_func() - Unhandled exception.
5710030 405 02/22 03:47:37 608119 ClOraControlAgent::OnClientTimeout() - Got timed out while waiting for msg from client 0

You can set sBYTESDIFFMBS additional setting <value> in MBs in OracleAgent/.properties.

This will update the job manager at every <value> in MBs specified in the key.

ORCL0012: Offline backup with Lights Out script fails

Issue

Offline backup using lights out script fails with the following error:

RMAN error "ORA-12528 TNS listener - all appropriate instances are blocking new connections

Resolution

As a workaround, add a reference to the database in the listener.ora file as shown below:

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = C:\oracle\product\10.1.0\db_1)
(PROGRAM = extproc)
)
(SID_DESC =
(SID_NAME = rman10g)
(ORACLE_HOME = C:\oracle\product\10.1.0\db_1)
(SID = rman10g)
)
)

Oracle offline backup with lights out option fails when you use the default value for retry attempts for the subclient. As a workaround, increase the retry attempts by setting the Tries number value greater than or equal to 5. See Configuring Subclients for Oracle Offline Backups for more details.

ORCL0013: Backup timeout failure

Issue

The backup fails because of a timeout.

Resolution

The default time for resources to allocate streams during RMAN command line backups is 86400 seconds (24 hours). If a backup fails due to a timeout being reached, you can configure the sALLOCATESTREAMSECS additional setting to increase the waiting time period.

ORCL0014: Backup fails because of $ORACLE_HOME/sqlplus/admin/glogin.sql

Issue

If the following line is present in the $ORACLE_HOME/sqlplus/admin/glogin.sql file, it may cause the SrvOraAgent server process on the CommServe to fail when browsing database contents or executing a backup.

set linesize 80

Resolution

To avoid such failures, comment out that line from the file and re-try the browse or backup operation.

  • Backup fails with following error:

    Character conversion not supported

    By default, the Agent sets the NLS_LANG environment variable to American_America.US7ASCII character set. However, if the Oracle database on the client uses a different NLS character set (for example, WE8MSWIN1252), the Agent’s backup operations may fail.

    In such cases, use the <oracle_SID>_NLS _LANG additional setting to set the NLS_LANG environment variable to American_America.<database_character_set> on the client computer.

ORCL0015: Database block corruption

Issue

The backup fails with the following error:

LISTING 2: r_20030520213618.log
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on d1 channel at 05/20/2003 21:36:26
ORA-19566: exceeded limit of 0 corrupt blocks for file
/u01/app/Oracle/oradata/MRP/sales_data_01.dbf

Resolution

Make sure that the maximum value for database block corruptions is set for the backup. It is recommended that you set this value to match the number of corrupted database blocks identified by RMAN for the database file being backed up.

ORCL0016: Backup fails intermittently on Linux clients

Issue

On Linux clients, if the libobk.so library fails to load, the backups may fail.

Resolution

As a workaround, do the following steps:

  1. Log in to the Oracle client computer as root.
  2. From the system prompt, enter the following command:

    ldconfig /<Base_directory_name>

    For example: # ldconfig <software installation path>/Base

This will ensure that the libobk.so library is loaded so that backups for Oracle on Linux can run successfully.

ORCL0017: Configuring an Instance or a Backup fails on Windows clients

Issue

Configuring an instance or a backup fails on Windows Clients.

Note: when using Oracle 12c, grant full control permission for the Oracle home user for the SnapProtect folder.

Resolution

Grant Full Control Permission to the Oracle Home User.

ORCL0018: Backup fails on Red Hat Enterprise Linux 4 with Oracle version 10.1.0.5 32-bit

Issue

The backup may fail with the following error on Red Hat Enterprise Linux 4 with Oracle version 10.1.0.5 32Bit as there is a known Oracle issue with libunwind.so.3 file:

channel ch1: starting piece 1 at Jul 12 2013 16:46:08
PID 30152, signal 6 (Aborted), address 0x75c8
[bt]: (1) /lib/tls/libpthread.so.0 [0x622890]
[bt]: (2) /lib/ld-linux.so.2 [0x3b07a2]
[bt]: (3) /lib/tls/libc.so.6(gsignal+0x55) [0x3f57a5]
[bt]: (4) /lib/tls/libc.so.6(abort+0xe9) [0x3f7209]
[bt]: (5) /soft/oracle/product/db/10.1.0.5/lib/libunwind.so.3(GetCurrentFrame32+0xdc) [0xb7ffd0ce]
[bt]: (6) /soft/oracle/product/db/10.1.0.5/lib/libunwind.so.3(_Unwind_RaiseException+0x5b) [0xb7ffc86b]
[bt]: (7) ./libstdc++.so.6(__cxa_throw+0x5d) [0xb60a126d]
[bt]: (8) ./libCvLib.so(_ZN10CvFwDaemonC1EPKcbii+0x2ee) [0xb6207c00]
[bt]: (9) ./libCvLib.so(_ZN10CvFwClient7connectEPKcS1_iiiiPFvR9CQiSocketPvES4_b+0xf6f) [0xb6211acb]
[bt]: (10) ./libCvSession.so(_ZN9CVSession16socketConnectionEPKcS1_+0x261) [0xb72cd4f1]
[bt]: (11) ./libCvSession.so(_ZN9CVSession9getSocketEPKcS1_+0x135) [0xb72cddd5]
[bt]: (12) ./libCvSession.so(_ZN9CVSession13getConnectionEPKvPKc+0x11b) [0xb72cdf1b]
[bt]: (51) oracleHWRHDEV(main+0xbb) [0x82816bf]
[bt]: (52) /lib/tls/libc.so.6(__libc_start_main+0xd3) [0x3e2de3]
[bt]: (53) oracleHWRHDEV(ldxsto+0x1d1) [0x828157d]
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch1 channel at 07/12/2013 16:46:33
RMAN-10038: database session for channel ch1 terminated unexpectedly
RMAN>
Recovery Manager complete.
]
3 16:46:33
RMAN-10038: database session for channel ch1 terminated unexpectedly
RMAN>
Recovery Manager complete.
]

Resolution

Upgrade your oracle version from 10.1.x to 10.2 to avoid the backup failure on Red Hat Enterprise Linux 4

ORCL0019: Log backup fails

Issue

If the Oracle database is configured to save the archive logs in the Flash recovery area, and Oracle subclients having both Protect backup recovery area and Archive Delete enabled at the same time then the backup will fail.

Resolution

To resolve this, there should be two different subclients, one for Protect backup recovery area and the other for Archive Delete.

  • Log backup fails if you select the default USE_DB_RECOVERY_FILE_DEST entry as a log destination for the backup.

    To resolve this, make sure that the log destinations are included in the PFile (init<SID>.ora) or SP file (spfile.ora) file. Also ensure that the correct log destination is selected for the backup.

ORCL0020: Backup fails on Linux clients because of unknown instance status

Issue

Backups may fail on Linux clients if the Oracle instance status is shown as UNKNOWN on CommCell Console.

Resolution

To resolve this issue, make sure the nproc value in /etc/security/limits.d/90-nproc.conf file is greater than 1024.

ORCL0021: Shared memory error

Issue

The backup failed because the shared memory on the HP-UX PA-RISC client has not been configured per operational guidelines.

Resolution

Add the DisableIPC_GLOBAL file in the /apps/simpana/Base directory on the client where the backup failed.

  1. Stop the SnapProtect software.
  2. Create an empty file called DisableIPC_GLOBAL in the /apps/simpana/Base directory. From the command line, type the following:

    touch /apps/simpana/Base/DisableIPC_Global

  3. Restart the SnapProtect software.

ORCL0034: Backup Fails with Permissions Issue

Issue:

The backup fails due to issues accessing the SnapProtect registry, log files and base directories.

The RMAN backup fails because it cannot load the CommVault SBT Media Management library.

Solution

Run the Database Readiness Check.

ORCL0037: Multiple Jobs for Oracle Third Party Command Line Operations

Issue:

For Oracle 12c, when performing Oracle multiple streams for third party command line operations, multiple jobs may be kicked off.

Solution

Add the user to Local security policy

  1. From Local Security Policy, navigate to User Right Assignment.
  2. Right-click Act as part of the operating system and then select Properties.
  3. Click on Add User or Group and then click OK.
  4. Right-click Create a token object and then select Properties.
  5. Click on Add User or Group and then click OK.
  6. Right-click Replace a process level token and then select Properties.
  7. Click on Add User or Group and then click OK.

ORCL0038: First Archive Log Backup May Fail after Initial Deployment

Issue:

On new SnapProtect deployments, or migrations from other vendors, the first archive log backup may fail because the logs may have been manually deleted.

The following Oracle error may be displayed.

RMAN-06059 expected archived log not found, loss of archived log compromises recoverability

Solution

  1. Run the crosscheck command to check for missing archive logs.

    crosscheck archivelog all ;

  2. Run the RMAN delete command to remove the entries and synchronize the RMAN catalog files with the database files.

    delete noprompt obsolete;

ORCL0039: Oracle crosscheck can take a long time to finish

Issue:

If there are a large number of Oracle backups available because of a higher retention, then crosschecking these backups can take a long time.

Solution

  1. Limit the CROSSCHECK scope by specifying backups completed after a specified time (for example 40 days).

    crosscheck backup completed after 'SYSDATE - 40'

ORCL0040: Third-party backup jobs go to a waiting state with "No resources"

Issue:

Third-party backups go to waiting state with a delay reason in the job controller. This can happen when the required number of streams could not be allocated.

Solution

If there are errors on the streams, this may block the streams from being allocated.

View the errors for the individual streams and then increase the number of streams for the storage policy.

To view the individual stream information:

  1. From the CommCell Console ribbon, click the Home tab, and then click Job Controller.
  2. Right-click the backup job and select Detail.

    The Backup Job Details for Job ID n dialog box displays the details of the selected job.

  3. On the Streams tab, view the status for the individual streams. The failure or delay reason is displayed.
  4. Click OK.

Increase the number of streams for the storage policy that is used for the job. For information on setting the streams for a storage policy, see Specifying the Device Streams.

ORCL0041: Oracle Backup Fails When the New Oracle CommCell Console Instance Name Is Greater than 18 Characters

Issue:

An Oracle backup fails when you rename the CommCell Console instance to a name that is greater than 18 characters and is different from the Oracle database instance name.

Note: The Oracle database installation limits the Oracle SID name to 8 characters and the Oracle database instance name to 12 characters.

Solution

  1. Configure the ##_DOC_PRODUCT_## software so that it ignores the CommCell Console instance name.
  2. Use a connect string to connect to the Oracle database.