Restore Troubleshooting - PostgreSQL iDataAgent

Backup Restore  
Table of Contents

PSQL0006: The latest log file in the pg_log directory displays error after a successful restore operation

Symptom

Following error appears in the latest log file after completion of a successful restore operation.

2014-03-04 18:13:47 IST DETAIL: The failed archive command was: test ! -f /vkmgk/PostgreSQL/9.2/wal/0000000700000001000000B9 && cp pg_xlog/0000000700000001000000B9 /vkmgk/PostgreSQL/9.2/wal/0000000700000001000000B9

2014-03-04 18:13:48 IST LOG: archive command failed with exit code 1

2014-03-04 18:13:48 IST DETAIL: The failed archive command was: test ! -f /vkmgk/PostgreSQL/9.2/wal/0000000700000001000000B9 && cp pg_xlog/0000000700000001000000B9 /vkmgk/PostgreSQL/9.2/wal/0000000700000001000000B9

2014-03-04 18:13:49 IST LOG: archive command failed with exit code 1

2014-03-04 18:13:49 IST DETAIL: The failed archive command was: test ! -f /vkmgk/PostgreSQL/9.2/wal/0000000700000001000000B9 && cp pg_xlog/0000000700000001000000B9 /vkmgk/PostgreSQL/9.2/wal/0000000700000001000000B9

2014-03-04 18:13:49 IST WARNING: transaction log file "0000000700000001000000B9" could not be archived: too many failures

Cause

Error appears in the log file if the archive command is run along with the test utility. Since both pg_xlog and WAL directories are backed up by the software, we cannot have the same copy of transaction log file in these directories.

Resolution

This error does not interfere with the backups and restores performed by the software, hence you can ignore them.

However, restarting the PostgreSQL Server will eliminate these warnings in the future.

To avoid the warnings, use the archive command without the test utility as shown in the example below.

archive_command = 'cp %p /opt/PostgreSQL/9.1/archive_dir/%f'

For more information on running a correct archive command, see PSQL0005.

PSQL0007: The file system restore job may sometimes complete with errors

Symptom 1

Failed to start the server

Cause 1

This occurs due to an improper shutdown of PostgreSQL server before a restore job.

Resolution 1

Follow these steps to resolve the issue,

  1. Use the following command to manually start the PostgreSQL server:

    bash-3.2$ ./pg_ctl -D /postgres/PostgreSQL/<version>/data start

  1. The system will try to start the server and display the following message,

    pg_ctl: another server might be running; trying to start server anyway

    2011-02-15 09:57:59 GMTFATAL: pre-existing shared memory block (key 5432001, ID 688130) is still in use

    2011-02-15 09:57:59 GMTHINT: If you're sure there are no old server processes still running, remove the shared memory block or just delete the file "postmaster.pid".

    pg_ctl: could not start server

    Examine the log output.

  1. Now type the following command,

    [root@cherry ~]# ps -ef|grep data

  1. The system will display the following message,

    root 1789 1562 0 05:04 pts/6 00:00:00 grep data

  1. If server is already running, stop the server and resume the job. Type the following command,

    [root@cherry ~]# ipcs -a

  1. The system will display the following,

    ------ Shared Memory Segments --------
    key shmid owner perms bytes nattch status
    0x00000000 65537 gdm 600 196608 2 dest
    0x00000000 688130 postgres 600 37879808 1 dest
    ------ Semaphore Arrays --------
    key semid owner perms nsems
    0x0052e2c3 983042 postgres 600 17
    0x0052e2c4 1015811 postgres 600 17
    0x0052e2c5 1048580 postgres 600 17
    0x0052e2c6 1081349 postgres 600 17
    0x0052e2c7 1114118 postgres 600 17
    ------ Message Queues --------
    key msqid owner perms used-bytes messages

  1. Remove all semaphores and rename postmaster.pid. Use the following command to rename the postmaster.pid.

    bash-3.2$ mv postmaster.pid postmaster.pid-old
    resumed pending job. job will complete successfully and server should also start successfully.

Symptom 2

Failed to start the server

When you try to start the server manually, you get the following error.

$ ./pg_ctl -D /var/lib/postgresql/9.1/main start
server starting
$ postgres cannot access the server configuration file "/var/lib/postgresql/9.1/main/postgresql.conf": No such file or directory

Cause 2

This error occurs because the PostgreSQL configuration files are not present in the data directory.

Resolution 2

Create hard link to the PostgreSQL configuration file under data directory as shown in the following example.

root@pgubun124:~# ln /etc/postgresql/9.1/main/postgresql.conf /var/lib/postgresql/9.1/main

Now, start the server.

PSQL0009: Restore operation from dump based backup set may fail sometimes

Symptom 1

Database level Restore job remains in pending state

Cause 1

This issue may occur when you restore a large database and you do not have sufficient space in the WAL directory during restore.

Resolution 1

Make sure to provide enough space in the WAL directory when you perform dump based restores.

Symptom 2

Table level restores to an auxiliary database that does not exist in the server may fail

Cause 2

This issue may occur when you use template1 as maintenance database and try to restore tables to an auxiliary database that does not exist in the server. The system will try to create an auxiliary database using template1. Since template1 is already in use as maintenance database, the restore will fail with the following error message:

createdb: database creation failed: ERROR: source database "template1" is being accessed by other users
DETAIL: There are 1 other session(s) using the database.

Resolution 2

Make sure to avoid using template1 as maintenance database when you try to restore a database to an auxiliary database that does not exist in the server.

Symptom 3

Restore for certain objects will fail if they are not dropped manually

Cause 3

This issue may occur if certain objects are not dropped before performing a restore operation.

Resolution 3

The following database objects will be dropped automatically from PostgreSQL server during the table restore:
  • Table
  • View
  • Domain
  • Sequence

All the database objects other than the above mentioned objects, are to be dropped manually before performing a restore operation.

Symptom 4

Cross machine restore of database fails as pg_restore fails.

You can see the following error in the PostGresRestore.log file on the client:

10948 4127e940 06/06 13:30:40 215559 PostgresRestore::applyPostGres() - [/opt/PostgreSQL/9.1/bin/pg_restore -C -v -U postgres --port=5432 -d postgres < /opt/SnapProtect/iDataAgent/jobResults/2/2432/215559//fifo] pump the data, rett = 1

You can also see the error below on the PostgreSQL Server log:

ERROR: role "testdb" does not exist

Cause 4

All the roles present on the source server are not present on the destination server.

Resolution 4

On the destination server, create the roles manually with the same privileges as present on the source server. Then run the restore operation.

Error Code: [94:19] PostgreSQL table-level restore operation fails

Problem

PostgreSQL table-level restore operation to a staging location fails when the Do not import to server check box is selected. You can see the following error in the Job Controller window:

Error Code: [94:19]
Description: PostgreSQL Database: [~win1320422065~] Restore Failed with PostgreSQL Error: [~Make sure all dependent tables are selected for table level restore. PG Error -- ~].

Cause

If the Collect Object List During Backup check box was not selected during the backup operation, we perform the object list creation during the restore operation. This also requires the schema information for the tables selected for restore.

Solution

Restore the tables from the schema level in the Browse window.

Browse Error Reporting

During a Browse operation, if one of the following error conditions occurs, an accurate problem description will be reported in the Browse window. This is extremely useful for troubleshooting.

MediaAgent Offline

Index Cache Inaccessible

Offline Requested Data Not Found

Recovering data associated with deleted clients and storage policies

Symptom

In a disaster recovery scenario, use the following procedure to recover data associated with the following entities:

  • Deleted storage policy
  • Deleted client, agent, backup set or instance

Before You Begin

This procedure can be performed when the following are available:

  • You have a Disaster Recovery Backup that contains information on the entity that you are trying to restore. For example, if you wish to recover a storage policy (and the data associated with the storage policy) that was accidentally deleted, you must have a copy of the disaster recovery backup that was performed before deleting the storage policy.
  • Media containing the data you wish to recover is available and not overwritten.
  • If a CommCell Migration license was available in the CommServe when the disaster recovery backup was performed, no additional licenses are required. If not, obtain the following licenses:
    • IP Address Change license
    • CommCell Migration license

    See License Administration for more details.

  • A standby computer, which is used temporarily to build a CommServe.
Recovering Deleted Data
  1. Locate the latest Disaster Recovery Backup that contains the information on the entity (storage policy, client, agent, backup set or instance) you are trying to restore.
    • Check the Phase 1 destination for the DR Set or use Restore by Jobs for CommServe DR Data to restore the data.
    • If the job was pruned and you know the media containing the Disaster Recovery Backup, you can move the media in the Overwrite Protect Media Pool. See Accessing Aged Data for more information. You can then restore the appropriate DR Set associated with the job as described in Restore by Jobs for CommServe DR Data.
    • If the job is pruned and you do not know the media containing the Disaster Recovery Backup, you can do one of the following:
      • If you regularly run and have copies of the Data on Media and Aging Forecast report, you can check them to see if the appropriate media is available.
      • If you do not have an appropriate report, and know the media that contains the DR Backup, catalog the media using Media Explorer. Once the cataloging process is completed, details of the data available in the media are displayed.
  2. On a standby computer, install the CommServe software. For more information on installing the CommServe, see Install the CommServe.
  3. Restore the CommServe database using the CommServe Disaster Recovery Tool from the Disaster Recovery Backup described in Step 1. (See CommServe Disaster Recovery Tool for step-by-step instructions.)
  4. Verify and ensure that the NetApp Client Event Manager NetApp Communications Service (EvMgrS) is running.
  5. If you did not have a CommCell Migration license available in the CommServe when the disaster recovery backup was performed, apply the IP Address Change license and the CommCell Migration license on the standby CommServe. See Activate Licenses for step-by-step instructions.
  6. Export the data associated with the affected clients from the standby CommServe as described in Export Data from the Source CommCell.

    When you start the Command Line Interface to capture data, use the name of the standby CommServe in the -commcell argument.

  7. Import the exported data to the main CommServe as described in Import Data on the Destination CommCell.

    This brings back the entity in the CommServe database and the entity is visible in the CommCell Browser. (Press F5 to refresh the CommCell Browser if the entity is not displayed after a successful merge.)

  8. You can now browse and restore the data from the appropriate entity.

    As a precaution, mark media (tape media) associated with the source CommCell as READ ONLY before performing a data recovery operation in the destination CommCell.