Loading...

MediaAgent - Troubleshooting

Table of Contents

MA0001: Client cannot use a MediaAgent with older version

Symptom

If client is upgraded to current version and MediaAgent is in the previous version, the backup job fails with the following error message:

Error Code: [32:406]
Description: Library [DiskLibrary], MediaAgent [ma01], Drive Pool [DrivePool], Media []: Client cannot use a MediaAgent with older version. Advice: Please upgrade the MediaAgent to the current version.
Source: delta, Process: JobManager

Resolution

Upgrade the MediaAgent to current version and resume the backup.

MA0003: Tape Spanning failure on AIX MediaAgents with Native Drivers

Symptom

Tape Spanning failure on AIX MediaAgents with Native Drivers.

Resolution

This might occur for the following reason:

On AIX MediaAgents with IBM library and ATAPE drivers and if Use Native device driver for data transfer for tape media option in the MediaAgent Properties is enabled, the MediaAgent software sets the drive attribute for trailer_labels to yes when a data protection operation is initiated.  If this attribute is set to no, (For example, by other applications sharing the library) data protections operations may fail when the operation spans to another tape.

Use the following command to see the drive attribute for trailer_labels.

lsattr –El rmtX

MA0004: Job or Drive Validation Failures in Solaris MediaAgents Using SCSI3 reservation for Native (st) or WA Drivers

Symptom

Backup or drive validation jobs might fail on Solaris MediaAgents when using st or WA drivers and SCSI3 reservations.

Native st driver is opened while using native or WA driver for data transfer to prevent possible corruption of data. By default, tape driver on Solaris uses SCSI-2 reserve/release. When the native drivers are utilized, they conflict with the SCSI-3 reserve/release done by the MediaAgent and the data protection operations wait indefinitely.

Resolution

To allow SnapProtect software to control tape device reservations, perform the following steps:

  1. Confirm that the st driver is bound to the connected drives through /var/adm/messages log.
    • If the drives are not bound to the st driver in the message log, go to step 2.
    • If the drives are bound to the st driver in the message log, go to step 6.

    Example of a library with two drives of like model type:

    Note: IBM ULTIUM-TD7 drives are bound to st27 and st28 in the image.

  2.  To configure st tape drive support on Solaris, the st device nodes should be added to /kernel/drv/st.conf and bound to all the tape drive LUNs which are to be used.

    Note: Create a copy of the st.conf file prior to any edits. Confirm this file retains its original permissions and owner/group definitions if copied back into place.

    If not already present, add the following lines in /kernel/drv/st.conf to bind tape drives with native st driver.

    name="st" class="scsi" target=0 lun=0;
    name="st" class="scsi" target=1 lun=0;
    name="st" class="scsi" target=2 lun=0;
    name="st" class="scsi" target=3 lun=0;
    name="st" class="scsi" target=4 lun=0;
    name="st" class="scsi" target=5 lun=0;
    name="st" class="scsi" target=6 lun=0;
    name="st" class="scsi" target=7 lun=0;
    #
    #In case of wide tape drives, use these targets
    #
    name="st" class="scsi" target=8 lun=0;
    name="st" class="scsi" target=9 lun=0;
    name="st" class="scsi" target=10 lun=0;
    name="st" class="scsi" target=11 lun=0;
    name="st" class="scsi" target=12 lun=0;
    name="st" class="scsi" target=13 lun=0;
    name="st" class="scsi" target=14 lun=0;
    name="st" class="scsi" target=15 lun=0;

  3. Reload the st driver using the following commands:

    rem_drv st
    add_drv st

  4. Confirm that the drives are bound to st driver in the /var/adm/messages log.
  5. Update the SCSI class information for each driver with the following commands:

    update_drv -a -i "scsiclass,01" st
    update_drv -a -i "scsiclass,08" sgen
    add_drv -i "scsiclass,01 scsiclass,08" wa

  6. Run the following script to configure all the fibre channel devices using the WA driver.

    Note: The WA driver must be configured for all the devices, even if st driver is intended for tape device operations. SnapProtect software uses the WA driver to control the Medium Charger for libraries or Auto Loaders on Solaris MediaAgents.

    software_installation_path/WA/wa_sunqlc_add

  7. Run the ScanScsiTool found in the <software installation path>/Base folder to obtain the native path for each drive type connected.

    Example of ScanScsitool output for a library and two like model drives:

  8. Run ScanScsiTool with –T and [native path] of drive.

    For example:

    Note: The output of the ScanScsiTool with –T and [native path] of each drive type will be required if the library has mixed drive types, or there are several libraries with different drive types attached to the Solaris MediaAgent.

  9. Use the output of the ScanScsiTool with –T and [native path] to create or update the tape-config-list in the st.conf file as described in the examples.

    Note: Create a copy of the /kernel/drive/st.conf file prior to any edits. Confirm this file retains its original permissions and owner/group properties if copied back into place.

    st.conf file format for a single drive model type environment:

    Note: Every entry in the st.conf file follows the below format. Any spaces in the Vendor field must be kept so that it is 8 characters long. Trailing spaces in the Product ID field can be removed. (Example: "|-VID--||-----PID------|")

    tape-config-list="VENDOR Product ID", "A Prettier Name to Display", "Config-Name";
    Config-Name = config information;

    Take notice of the required semicolon (;) at end of each line.

    Example of values defined for single drive model type environment using output from Step 8:

    VENDOR Product ID = IBM ULTRIUM-TD7
    A Prettier Name to Display = IBM ULTRIUM-TD7
    Config-Name = CFGIBMULTRIUMTD7
    config information = 2,0x3B,0,0x1038619,4,0x5A,0x5A,0x5C,0x5C,3,60,1500,600,2040,780,780,24600

    Example of st.conf file entry:

    st.conf file format for a mixed drive type environment:

    tape-config-list="VENDOR Product ID1", "A Prettier Name to Display1", "Config-Name1", "VENDOR Product ID2", "A Prettier Name to Display2", "Config-Name2", "VENDOR Product ID3", "A Prettier Name to Display3", "Config-Name3", "VENDOR Product ID4", "A Prettier Name to Display4", "Config-Name4";
    Config-Name1 = config information1;
    Config-Name2 = config information2;
    Config-Name3 = config information3;
    Config-Name4 = config information4;

    Take notice of the required semicolon (;) at the very end of the tape-config-list as well as after each defined Config-Name with config information entry.

    Example of st.conf file example for mixed drive model type environment:

  10. Reboot the Solaris OS.

MA0005: Adjust Timeout Value for SCSI Commands on MediaAgents

Symptom

Backup job fails due to device timing out error appears in MediaManager.log file.

Resolution

To resolve this issue, adjust the timeout value for SCSI commands to these devices.

  • On Windows, the system uses native drivers and the timeout values are determined by the drivers.

    The SCSI timeout values may only apply to the SCSI commands associated to the library. The following are the SCSI timeout registry values that are available on Windows:

    HKEY_LOCAL_MACHINE\SOFTWARE\CommVault Systems\Galaxy\Instance<xxx>\ScsiTimeouts

    INITIALIZE_ELEMENT_STATUS    *600*
    INITIALIZE_ELEMENT_STATUS_WITH_RANGE *600*
    INQUIRY    *30*
    MOVE_MEDIUM    *1500*
    READ_ELEMENT_STATUS    *600*

  • On UNIX it's possible to specify timeout values for any SCSI command as long as the command is delivered using the pass-through driver as arm changers are always accessed using pass-through nodes on all UNIX platforms.

    Tape drives by default are accessed using native drivers on AIX, Solaris, HP-UX, and Linux. Native drivers do not offer a way to customize SCSI timeouts. However, it is possible to enable pass-through mechanism from the CommCell Console by disabling the Use Native device driver for data transfer for tape media option in the MediaAgent Properties.

    Once this is done pass through SCSI timeouts can be customized by modifying the following registry values:

    /etc/CommVaultRegistry/Galaxy/Instance<xxx>/

    .internal.unique_id 1124467005_16773_40966_392845154
    DEFAULT *120*
    ERASE *18000*
    INITIALIZE_ELEMENT_STATUS *600*
    INITIALIZE_ELEMENT_STATUS_WITH_RANGE *600*
    LOAD *900*
    MOVE_MEDIUM *1500*
    READ *900*
    READ_ELEMENT_STATUS *600*
    RESERVE *1200*
    REWIND *1800*
    SEEK_BLOCK *900*
    SPACE *900*
    WRITE *900*
    WRITE_FILEMARKS *900*

These are the rules that should be observed while making the changes:

  • All timeouts are in seconds..
  • If a timeout is inside asterisks (e.g. *600*), it means that this is a default timeout. If you want to change the default value, you must remove the asterisks, or else the system will revert the timeout back to the default. This is done to simplify upgrades.
  • If you want to change timeout for a command, which is explicitly listed in registry, you must put the new value on that respective line.
  • If you want to change timeout for an unlisted command, you have two ways:
    • You can change the value for the DEFAULT timeout,
    • You can add the unlisted command by inserting a line like "CDB_XX <timeout>" where XX is the command's cdb[0] in hex.

MA0006: Failed MediaAgent

Symptom

If a MediaAgent is failed and there are other MediaAgents in the CommCell, you can re-establish backup or restore operations for the affected client computers. Use any one of the following solutions.

Resolution #1

  1. Create new storage policies where the primary copy transfers its backup data through a working MediaAgent and to a working library.
    • For tape libraries, ensure that the MediaAgent(s) that control both the library and target drive pool are in working order.

      For shared libraries, different MediaAgents can control the library and a given drive pool. To identify these MediaAgents, check both the Library Properties and Drive Pool Properties dialog boxes.

    • For disk libraries, ensure the MediaAgent that controls the disk library is in working order.

      To identify the MediaAgent, check the related Library Properties dialog box.

  2. For each of the subclients of the affected client computers, perform the following:
    • From the CommCell Browser, expand Client Computers | <Client>|<Backup Set>
    • Right-click the appropriate subclient and then click Properties.
    • Click the Storage Devices tab and select one of the newly created storage policies.

      If you want to use the existing storage policies, select one of the existing storage policies to back up the client computers that are affected by the failed MediaAgent.

    • Click OK.

      Once the subclients of the affected client computers are associated to working libraries, you can resume backup operations.

Resolution #2

If a storage policy has a secondary copy and this copy points to another MediaAgent, and if Auxiliary Copies are performed regularly on this copy (therefore making available backed up data in an additional manner), promote the secondary copy to a primary copy.

See Setting Up the Storage Policy Copy to be the Primary Copy for instructions.

MA0007: Jobs might fail when Client or MediaAgent is using E1000 or E1000E network adapter

Symptom

The following message appears in the Event Viewer window, when backup, restore, synthetic full or Auxiliary Copy jobs are running:

This Client is VMware hosted Windows 2012 and has Network adopter [Intel(R) 82574L Gigabit Network Connection]. It may corrupt the Network buffer. It is suggested to change VMware-network adapter to VMXNet3.

Also, one of the following message is displayed in the cvd.log file on the Data Mover MediaAgent, when restore, synthetic full, data verification or Auxiliary Copy jobs are running:

Cannot decode packet header: invalid start signature

Error while reading pipeline buffer from MediaAgent

[UNCOMPRESS ] #UNCOMPRESSION ERROR

Cause

The log entries are written to Data Mover MediaAgent cvd.log file when the data buffer that is sent from the source computer is invalid.

The data corruption might occur when the source computer is a Windows 2008/2012 virtual machine that was created using ESX Server version 5.x with e1000 or e1000e as network adapter. This is a known issue with the ESX Server. For more information on this issue, see VMware KB article 2056468.

Resolution

Change the network adapter of the virtual machine to vmxnet3.

MA0008: Synthetic Full Backup goes to Pending State after Upgrading a MediaAgent

Symptom

Running a synthetic full backup gives the error code:

Error Code: [72:40]
Description: Failed to open Archive File. The connectivity between client and MediaAgent may be broken OR an invalid copy precedence may have been specified.
Source: <hostname>, Process: clRestore

Cause

Changing an existing storage policy to run an incremental backup from an older version of MediaAgent to a newer version. This is not supported.

Resolution

Upgrade the MediaAgent running the earlier version to the newer version.

MA0009: Auxiliary Copy job or Restore job fails with data integrity validation error

Symptom

If the Auxiliary Copy job or restore job encounters invalid or unreadable data on the disk, the job might fail with the following error message:

Data Integrity validation failed for the data read from media

Resolution

  1. Run the data verification job to view the jobs that have the invalid data for a storage policy copy.

    See Data Verification for instructions.

    Run deduplicated data verification job to view the jobs that have the invalid data for a storage policy copy. See Verifying Deduplicated Data for instructions.

    When the data verification job is complete, the backup jobs with invalid data are identified with data verification status as Failed.
  2. Run the backup job.

    For more information on how to run backups, see the documentation for the specific agent.

    When the backup job is run, the new data blocks are not referred to the invalid data. As a result, new baseline data is written to the storage media.