SnapProtect - Troubleshooting - Oracle

SnapProtect Backup Failures

Failed in Indexing Operation (Create or Update Index) Make sure to perform a crosscheck archive log all operation. Then, retry the SnapProtect backup operation.
Failed during VSS SnapProtect Backup Verify the Windows Event Viewer, if any error occurs during the VSS SnapProtect backup operation.

There is a known issue with Oracle VSS Writer memory leak (metalink ID 1358570.1). Hence, if Oravssw.exe consumes more memory, restart the service.

SnapProtect Backup Fails SnapProtect Backup operations fail if the database is in the NOARCHIVELOG  mode. Alter the database to ARCHIVELOG mode and then perform the SnapProtect backups.

SQL>startup mount;

SQL>alter database archivelog;

SQL>alter database open;

Sometimes, the SnapProtect backup operations may fail if you enable the SSKIPBACKUPBROWSE Additional Setting. Disable it.

Mount Operation Fails on Windows During RMAN backup copy operation on proxy, the mount operations fails with the following error message:

OS mount failed : [Mount failed: E_FAIL (MM.60611)]

To resolve this issue make sure the mount directory is empty and there are no drives (same as source data drives) exist on the proxy.

SnapProtect restores of high transaction rate databases may fail

Do not use SnapProtect to back up archive log volume of high transaction databases, because the following errors may be encountered while restoring from those snaps or during the  backup copy of those SnapProtect  jobs.

ORA-00283: recovery session canceled due to errors

ORA-00354: corrupt redo log block header

ORA-00353: log corruption near block 86016 change 7963079914 time 01/28/2014 09:54:46

Instead back up the archive logs using RMAN. Select the Use Rman for Log Backup check box on the Logs Backup tab of the Subclient Properties dialog box.

SnapProtect backup job runs indefinitely

SnapProtect Backup operations may run indefinitely due to following reasons:

  • If the archive log location is full. In such cases, you have to either clear the archive logs to make enough space available or specify a different archive log location.
  • If there are faulty or failed multipath devices. If faulty devices exist, delete them using the following command on a linux computer.

    dmsetup message <faulty multipath device> 0 "fail_if_no_path"
    multipath -f <faulty multipath device>

Oracle RMAN snap to tape incremental copy fails on the proxy computer

When performing an Oracle RMAN incremental snap to tape copy, note the following:

  1. The Oracle database  installed on the proxy machine should be of the same version as the source. For example if Oracle 10.2.0.4 is installed on source then the proxy also should be of the same version i.e. 10.2.0.4
  2. Oracle user ID/group ID on source and proxy should be the same otherwise the RMAN backup copy will fail with permission issues.
  3. Copy the Oracle parameter file pfile from the source to the proxy (say the instance only as spfile)

    sqlplus <username/password@servicename> as sysdba << EOF

    Create pfile from spfile;

    Exit;

    EOF

    Copy the pfile init<instance name>.ora to the proxy computer and the destination location should be $ORACLE_HOME/dbs/ with oracle user permissions. Also, copy the oracle password file from the source to the proxy computer's $ORACLE_HOME/dbs/ directory.

  4. Create the bdump, udump, adump, cdump and diagnostic_dest directories. Please note that the directories should be in the same location as the source.
  5. Create the directories DB_CREATE_FILE_DEST, LOG_ARCHIVE_DEST and any other directory required for starting the database in NOMOUNT mode. If there are multiple archive destinations, then create the directories for each of the archivelog destinations.
  6. Copy $ORACLE_HOME/network/admin/tnsnames.ora configuration from source to proxy. If the entire content cannot be copied then copy at least the configuration related to catalog connection.
  7. Startup the proxy instance in NOMOUNT mode.
  8. Now configure the proxy Oracle Instance on the CommServe Console and status should be started. Now you are all set to do Oracle RMAN snap to tape incremental

Note that for incremental backup, snap clone will be mounted in the same location as the source mount-point of the source database. For example, if the data mount-point is /netapp/data then on the proxy too it will be mounted in /netapp/data, similarly for the archive log location chosen for the SnapProtect backup. Therefore, ensure that on the proxy this mount-point is free and there is no such directory existing on the proxy computer ( even if it exists it should be empty).

SnapProtect backup copy on proxy fails

Sometimes, the Backup copy on proxy fails with the following error: ORA-7217 sltln: environment variable cannot be evaluated.

Example:

If RMAN configuration parameters contain $s as shown below:

RMAN> show all;

RMAN configuration parameters for database with db_unique_name are:

CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 0 DAYS;

CONFIGURE BACKUP OPTIMIZATION ON;

CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default

CONFIGURE CONTROLFILE AUTOBACKUP OFF;

CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '$ORACLE_BKUP/$ORACLE_SID/controlfile%F.f';

CONFIGURE DEVICE TYPE DISK PARALLELISM 2 BACKUP TYPE TO COMPRESSED BACKUPSET;

CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default

CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default

CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT '$ORACLE_BKUP/$ORACLE_SID/%d_df_%t_b%s_p%p.rmf' MAXPIECESIZE 2000 M;

CONFIGURE MAXSETSIZE TO UNLIMITED; # default

CONFIGURE ENCRYPTION FOR DATABASE OFF; # default

CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default

CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default

CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default

CONFIGURE SNAPSHOT CONTROLFILE NAME TO '$ORACLE_BKUP/$ORACLE_SID/snapcf.f';

ORA-7217 sltln: environment variable cannot be evaluated

Perform one of the following steps to resolve this issue:

  • Configure “/” as connect string on proxy and set the environment variables specified for snapshot control file path in cvprofile residing in Base directory.
    1. Edit /opt/simpana/Base/cvprofile
    2. export ORACLE_BKUP=/tmp
    3. simpana restart
  • Change the configuration parameters on source to remove environment variables ($ORACLE_BKUP etc).

Sometimes, the backup copy on proxy fails with the following error even though the control file copy is cataloged into recovery catalog:

RMAN-03002: failure of backup command at 04/11/2012 13:44:03

RMAN-06004: ORACLE error from recovery catalog database: RMAN-20220: control file copy not found in the recovery catalog

RMAN-06090: error while looking up control file copy: +DATA/backup.ctl.galaxy

RMAN>

Perform the following to resolve this failure:

  • Unregister and then reregister the source database to recovery catalog.
  • Resume the backup copy job.

Sometimes the SnapProtect operation and a backup copy on proxy for ASM database fails with the following errors:

For Snap:

Error Code: [19:1335]

Description: Oracle Backup [GetASMLogDisks Failed.]

For Backup Copy:

Error Code: [19:1335]

Description: Oracle Backup [Mounting snap or renaming ASM DiskGroup operation failed with an error. Please check the logs for more details.]

30156 f48b7410 04/18 13:00:37 97550 OraObject::GetOraMode() - oraMode = SHUTDOWN.

30156 f48b7410 04/18 13:00:37 97550 OraObject::GetOraMode() - oraMode = SHUTDOWN: return Error.

30156 f48b7410 04/18 13:00:37 97550 OraInfoBase::GetInfo() - CheckOraMode() failed: oraError=301989906

30156 f48b7410 04/18 13:00:37 97550 ASMSnapUtil::runSqlWithScript() - Failed while getting the Oracle version

30156 f48b7410 04/18 13:00:37 97550 ASMSnapUtil::runSqlWithScript() - Writing into file [@/opt/simpana/Base/Temp/tmp_asm_30156.sql] sql = [select 'U,'|| state from v$asm_diskgroup where name = 'DATADG1'
/
]
30156 f48b7410 04/18 13:00:37 97550 ASMSnapUtil::runSqlWithScript() - Executing SQL select 'U,'|| state from v$asm_diskgroup where name = 'DATADG1'
/
failed with an error Database is in SHUTDOWN mode
30156 f48b7410 04/18 13:00:37 97550 ASMSnapUtil::isASMDiskGroupMounted() - Failed while executing the sqlscript [select 'U,'|| state from v$asm_diskgroup where name = 'DATADG1'
/
] output = []
30156 f48b7410 04/18 13:00:37 97550 ASMDiskGroup::renameASMDiskGroup() - Child change user=oracle, gid=501, uid=501
30156 f48b7410 04/18 13:00:37 97550 ClOraSnapAgent::RenameAndMountASMDiskGroup() - Successfully Renamed ASM DISK GROUPS
30156 f48b7410 04/18 13:00:37 97550 OraObject::GetOraMode() - strictSID = 0
30156 f48b7410 04/18 13:00:37 97550 OraChildProcess::SetPostForkParam() - Parent path = /oracle11gr2/product/11.2.0/dbhome_1/bin/sqlplus
30156 f48b7410 04/18 13:00:37 97550 OraChildProcess::SetPostForkParam() - Parent oraUser = oracle
 

Perform the following to resolve this failure:

  1. Log in to the CommCell Console.
  2. Verify the Oracle +ASM instance status. The status should be in started mode.
  3. Resume the SnapProtect operation or the backup copy job that you need to perform.

Latency occurs for releasing file descriptors by ASM instance during dismounting of ASM diskgroups

On Oracle version 11.2.02, there is a latency for releasing file descriptors by ASM instance during dismounting of ASM diskgroups.

Apply the following patch on Oracle version 11.2.02 to resolve the latency issue:

Patch:11666137

https://updates.oracle.com/download/11666137.html

SnapProtect backup copy of an oracle ASM database fails if you perform a SnapProtect operation without disabling the snap Integrity for persistent snap engines

If you perform a SnapProtect operation without disabling the Snap Integrity for persistent snap engines, the following error is displayed:

19935 b7f476d0 03/01 12:20:03 11972 ASMDiskGroup::mountASMDisk() - Failed while excuting sql [alter system set asm_diskstring='/ora_snap*/asm*','/ora_snap*/*'

/

alter diskgroup DATA mount

/

] output = [ SQL*Plus: Release 11.2.0.1.0 Production on Thu Mar 1 12:20:03 2012,Copyright (c) 1982, 2009, Oracle. All rights reserved.,SQL> SQL> SQL> SQL> Connected.,SQL> SQL> SQL> SQL> SQL> SQL> SQL> ,System altered.,alter diskgroup DATA mount,ERROR at line 1:,ORA-15032: not all alterations performed,ORA-15017: diskgroup "DATA" cannot be mounted,ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA",SQL> SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production,With the Automatic Storage Management option,]

19935 b7f476d0 03/01 12:20:03 11972 ASMDiskGroup::mountASMDisk() - One of the diskgroups could not be mounted

19935 b7f476d0 03/01 12:20:03 11972 ClOraSnapAgent::RenameAndMountASMDiskGroup() - mountASMDiskGroup failed

Perform the following steps to resolve this issue:

  1. Mount the snaps from the CommCell console (List Snaps-->Mount) on Proxy and then use the following command:

    /opt/simpana/Base/diskgroup_rename.pl -y <original source diskgroup name> <mounted disk path>

    For example, if snap is mounted on /tmp and the original diskgroup name is DATA01, then perform the following command:

    [oracle@Hindu ~]$ /opt/simpana/Base/diskgroup_rename.pl -y DATA01 /tmp/volume_1335990104/DATA01

    renaming disks to: DATA01

    current name for /tmp/volume_1335990104/DATA01: DS2995770

    new name for /tmp/volume_1335990104/DATA01: DATA01

  2. Unmount the Snap. Perform the same for all the snaps in that SnapProtect operation.

Changing the Snapshot control file and SP file location to snappable volume

Run the following RMAN command to change  the path for the snapshot control file:

Configure Snapshot CONTROLFILE NAME TO '<Snappable volme>\SNCFVSSDB.ora';

Remove the SP file from the $ORACLE_HOME/database location or move sp file to snappable location and add this SPfile location to pfile.

Oracle VSS SnapProtect job fails on Windows

XML Document is Long

Oracle VSS Snap job fails with "failed to finalize shadow" and windows events show the following error message:

XML document is too long.

Oracle stores the redo logs in BCD XML. Hence, the XML file size may be large if there are a lot of transactions. Therefore, rerun the SnapProtect operation when there is less I/O activity on the oracle database.

Device Not Filer Volume

When performing VSS SnapProtect job after restoring the control file, the job may  fail with one of the following errors:

Device is not filer volume

Device Not Found

This happens because the Oracle VSS writer includes a snapshot control file for RMAN, which by default is in the %ORACLE_HOME%/database location and may not be on a snapable volume.

To resolve this, run the following RMAN script to change the location of the snapshot control file:

RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO '<Snappable Volume>\SNAPSHOT.CTL';

Example:

RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO 'F:\SNAPDB_LOG\SNAPTEST\SNAPSHOT.CT L';

Output:

using target database control file instead of recovery catalog

old RMAN configuration parameters: CONFIGURE SNAPSHOT CONTROLFILE NAME TO 'F:\SNAPDB_LOG\SNAPTEST\SNAPSHOT.CTL';

new RMAN configuration parameters: CONFIGURE SNAPSHOT CONTROLFILE NAME TO 'F:\SNAPDB_LOG\SNAPTEST\SNAPSHOT.CTL';

new RMAN configuration parameters are successfully stored

RMAN> exit

Tablespace and Data file restore from snap using file system fails on Windows

During partial tablespace/data file restore from snap using File System, if the database is open, the data file restore operation may fail with busy errors.

To resolve this issue, make sure the database is in mount mode before running partial restore from Snap using File System.

Oracle 11.2.0.3 and higher

When using Oracle 11.2.0.3 version or higher, if the database ID for auxiliary instance is changed using NID script, the table restore from snap or File System backup copy and database cloning restores will fail with the following error message:

NID-00106: LOGIN to target database failed with Oracle error:

ORA-01017: invalid username/password; logon denied

To resolve this issue, install the Oracle update 13366202 on the destination database.

SP file creation fails on proxy

Backup copy operations fail with the following error message on the proxy computer:

Failed to create spfile on proxy

To resolve this issue, ensure the following:

  • The listener should be configured and in running state
  •  Pfile/spfile should not contain any invalid parameters
  • All directories mentioned in the pfile/spfile must be created correctly on the proxy computer.

Completed with one or more errors

Backup jobs from Oracle iDataAgent will be displayed as "Completed w/ one or more errors" in the Job History in the following cases:

  • When RMAN Script execution for the backup job completes with warnings.
  • When job is killed after backing up some data.
  • During offline backups, if the database cannot be opened after a backup.

Restore jobs from Oracle iDataAgent will be displayed as "Completed w/ one or more errors" in the Job History in the following cases:

  • During a table restore, if the export or import of table fails.
  • RMAN recovery is completed, but an incorrect open mode is selected for restore.

Catalog Errors during SnapProtect Backup

During SnapProtect backup job, you may notice the following catalog errors:

File Name: /opt/snapprotect/MediaAgent/SnapVolumeMounts/SnapMnt_1_2_26099/
oradata/ONLINE/dbconf.cfg
RMAN-07517: Reason: The file header is corrupted

File Name: /opt/snapprotect/MediaAgent/SnapVolumeMounts/SnapMnt_1_2_26099/
oradata/ONLINE/initONLINE.ora
RMAN-07517: Reason: The file header is corrupted

File Name: /opt/snapprotect/MediaAgent/SnapVolumeMounts/SnapMnt_1_2_26099/
oradata/ONLINE/spfileONLINE.ora
RMAN-07518: Reason: Foreign database file DBID: 0 Database Name:

File Name: /opt/snapprotect/MediaAgent/SnapVolumeMounts/SnapMnt_1_2_26099/
oradata/ONLINE/GalaxyControlFile.Conf
RMAN-07517: Reason: The file header is corrupted

These error messages can be ignored. As part of the backup job, files such as spfile, pfile and backup controlfiles are copied to the archive log location. Oracle does not recognize these files as archive log files and hence displays the error messages. 

Recover until time fails because archived redo log is corrupted

Issue

The snapping of the archive log volume may not be consistent if there are high transactions and frequent log switches happening.

Symptom

The Recover Until Time fails with the Oracle error ORA-00353: log corruption near block.  This can happen when there are frequent online redo log switches.

Resolution

The subclient needs to have the Use Rman for Log Backup option enabled.  Right-click the subclient and select Properties. On the Subclient Properties dialog box, navigate to the Logs Backup tab and select the Use Rman for Log Backup check box. Retry the operation.

Restore error when Switch Database mode is enabled

Issue

When restoring Oracle database on Unix clients, if the Switch database mode for restore option is selected to keep database in correct mode during restore, the database may not restart after switching the database mode. Also, the restore operation may fail with the following error message.

RMAN Script execution failed with error [RMAN-04014: startup failed: ORA-27137: unable to allocate large pages to create a shared memory segment]. Please check the Logs for more details.

Resolution

This issue occurs if the oracle user has a higher ulimit configuration than the root user. To resolve this issue, apply the ulimit value of Oracle user for the restore using the following steps:

  1. From the CommCell Browser, navigate to Client Computers.
  2. Right-click the <Client>, and then click Properties.
  3. Click Advanced.
  4. Click the Additional Settings tab.
  5. Click Add.
  6. In the Additional Settings dialog:
  • In the Name box, type OracleUser.
  • In the Category box, select or type OracleAgent from the list.
  • In the Type box, select String.
  • In the Value box, type the Oracle user name (eg., oracle).
  1. Click OK.
  2. Restart SnapProtect Services on the client.

Note: If you change the shell limits after you add this Additional Setting, restart the SnapProtect services, so the limits take effect.

Recovering data associated with deleted clients and storage policies

Symptom

In a disaster recovery scenario, use the following procedure to recover data associated with the following entities:

  • Deleted storage policy
  • Deleted client, agent, backup set or instance

Before You Begin

This procedure can be performed when the following are available:

  • You have a Disaster Recovery Backup that contains information on the entity that you are trying to restore. For example, if you wish to recover a storage policy (and the data associated with the storage policy) that was accidentally deleted, you must have a copy of the disaster recovery backup that was performed before deleting the storage policy.
  • Media containing the data you wish to recover is available and not overwritten.
  • If a CommCell Migration license was available in the CommServe when the disaster recovery backup was performed, no additional licenses are required. If not, obtain the following licenses:
    • IP Address Change license
    • CommCell Migration license

    See License Administration for more details.

  • A standby computer, which is used temporarily to build a CommServe.
Recovering Deleted Data
  1. Locate the latest Disaster Recovery Backup that contains the information on the entity (storage policy, client, agent, backup set or instance) you are trying to restore.
    • Check the Phase 1 destination for the DR Set or use Restore by Jobs for CommServe DR Data to restore the data.
    • If the job was pruned and you know the media containing the Disaster Recovery Backup, you can move the media in the Overwrite Protect Media Pool. See Accessing Aged Data for more information. You can then restore the appropriate DR Set associated with the job as described in Restore by Jobs for CommServe DR Data.
    • If the job is pruned and you do not know the media containing the Disaster Recovery Backup, you can do one of the following:
      • If you regularly run and have copies of the Data on Media and Aging Forecast report, you can check them to see if the appropriate media is available.
      • If you do not have an appropriate report, and know the media that contains the DR Backup, catalog the media using Media Explorer. Once the cataloging process is completed, details of the data available in the media are displayed.
  2. On a standby computer, install the CommServe software. For more information on installing the CommServe, see Install the CommServe.
  3. Restore the CommServe database using the CommServe Disaster Recovery Tool from the Disaster Recovery Backup described in Step 1. (See CommServe Disaster Recovery Tool for step-by-step instructions.)
  4. Verify and ensure that the NetApp Client Event Manager NetApp Communications Service (EvMgrS) is running.
  5. If you did not have a CommCell Migration license available in the CommServe when the disaster recovery backup was performed, apply the IP Address Change license and the CommCell Migration license on the standby CommServe. See Activate Licenses for step-by-step instructions.
  6. Export the data associated with the affected clients from the standby CommServe as described in Export Data from the Source CommCell.

    When you start the Command Line Interface to capture data, use the name of the standby CommServe in the -commcell argument.

  7. Import the exported data to the main CommServe as described in Import Data on the Destination CommCell.

    This brings back the entity in the CommServe database and the entity is visible in the CommCell Browser. (Press F5 to refresh the CommCell Browser if the entity is not displayed after a successful merge.)

  8. You can now browse and restore the data from the appropriate entity.

    As a precaution, mark media (tape media) associated with the source CommCell as READ ONLY before performing a data recovery operation in the destination CommCell.