Cluster Configuration - Troubleshooting

Cluster configuration fails for one or more remote cluster nodes

The cluster group configuration may fail to update a remote client if the client presents the following issues:

  • Network related problems (client not reachable)
  • SnapProtect services not running

A message indicating the cluster configuration failure for the client will be displayed, as shown in the example below:

It is recommended that you investigate and fix the issue in the remote client and then try to associate this client to the Cluster Group from the Cluster Group client properties window. If you want to update the client later, use the steps below to force the cluster configuration on the remote client. Note that this operation will create an update request for the client in the CommServe database.

  1. From the CommCell Browser, navigate to Client Computers | <Cluster Group Client>.
  2. Right-click the <Cluster Group Client> and select Properties.
  1. Click the Cluster Group Configuration tab.
  2. Click the Force Sync configuration on remote nodes checkbox.
  3. Click OK.
  4. Click OK from the Information dialog box.

Removal of cluster configuration from Client Completed with Errors

When a client computer is removed from the cluster group, all cluster settings are removed from the CommServe and client computer. If the CommServe fails to remove the cluster settings from the client, use the following steps to resolve this issue:

  1. Add the client back to the cluster group.
  2. Resolve the issues in the client that caused the failure. Check the failure reason stated in the error message received during the cluster configuration update. For example, <Connection to remote machine [machine_host_name] refused. Please check that the services are running on the remote machine.>
  3. Remove the client again from the cluster group.

Backup fails during index restore due to MediaAgent failover

A backup may fail if the MediaAgent enters the failover state while the scan phase is restoring an index from media. The reason for job failure will state that the index restore was not able to complete due to failover. In such a situation, kill the job and perform another backup.

Executable application errors during failover

Executable application errors have been observed on the originating active node during a failover. Once the new node takes over, the jobs will continue and complete. On rare occasions, an archiveindex.exe application error may corrupt the index and the backup cannot recover. In such a situation, kill the job and perform another backup.

Services stop running after a failover on Linux clusters

During a failover on Linux clusters, services on the node that takes over may be killed by cluster services. To ensure the services are restarted on the new node, add commands to start services to failover scripts.