Data Replication

Topics How To  
Table of Contents

Overview

Data Replication is the process of copying specified, file-level content from one computer, the source computer, to another, the destination computer. This is achieved through an initial transfer of the specified data, after which the replicated copy is kept updated in real time with any changes that are made to the data on the source computer. This replicated copy on the destination computer provides on-going, nearly-real-time disaster recovery protection for the source computer, unlike most data protection solutions which require significant time to perform a complete data protection operation. In addition, data replication provides a basis for additional data protection activities, such as Recovery Points (snapshots) and backups of Recovery Points.

The content for replication can be defined at the directory or volume level on a source computer and replicated to a destination computer. Once the initial transfer is complete, a driver on the source computer performs the following:

  • continuously monitors changes to the files contained in the defined directories or volumes
  • logs all new files, and changes to existing files
  • automatically transfers the log to the destination computer, thus replicating all new files and changes to existing files, from the source computer to the destination computer in nearly real time. See Replication Logs for specific information about frequency and timing of data replication.

A persistent connection is used as a data transfer mechanism, optionally compressing and encrypting data across the network, and through this facility, the destination computer is kept in sync with the defined content on the source computer. If the connection is interrupted at any point, the log continues to be maintained on the source computer, and once the connection is restored, CDR will automatically re-sync with the destination computer, bringing the replica up-to-date. Note that re-syncing is time and disk space intensive, and thus to be avoided if possible. For some additional discussion of this subject, see Interruptions and Restarts. If multiple Replication Pairs are active, CDR uses multiple threads to perform these operations on all Replication Pairs in parallel. CDR operations on a T1 link are fully certified. The success of CDR operations on a slower link is not guaranteed.

Supported Configurations

Some of the scenarios for data replication are listed below, but this is not a complete list of all the possible data replication configurations.

Data Synchronization

The following options can be used to perform data transfer from source to destination:

Full Resync

Full Resync should be necessary only in cases when no data presented on destination. Full Resync copies all the files from the source to the destination computer. When you start Full Resync at Replication Set or Replication Pair level, you can specify Full Resync, causing the Replication Pair to begin at the Baseline Scan phase.

Smart Sync

Smart Re-Sync is the default behavior of CDR when activities are interrupted and cannot be seamlessly restarted at the same point again. In this case all new/modified data will be transferred from source to destination.

Optimized Sync

If replication is interrupted and there is a chance that the data on the destination is manually partially deleted or modified etc., the destination path is considered as inconsistent and optimized sync is recommended to rebuild it again based on the current data in the source path with consideration of data which already presented on destination.

Optimized Sync is used to transfer the modified/new files on the source computer to the destination computer along with data missing on destination. In previous attempts of sync had failures these failures will be re-tried during running Optimized Sync.

Optimized can be used in the following scenarios

  • If after interruption in replication the filtering option are modified, such as removing filter that was previously applied, pre-existing files become eligible for transfer to the destination
  • If some data was partially modified on destination
  • If previous sync had failures.

See Start Data Replication Activity for step-by-step instructions.

For new installations, Optimized Sync is enabled by default. You must enable Optimize Sync manually on upgraded clients by selecting the Include files that do not match with destination copy option. For step-by-step instructions, see Add a Replication Pair.

To change the state of one or more Replication Pairs at once from the Replication Set level, see Change the state of Replication Pair for step-by-step instructions.

Replication Prediction

Replication Prediction can be used to track the size of the data that has been added or modified for the time during which a pair is active and monitoring; for Windows file systems, monitoring is performed at the volume or folder level; for UNIX, monitoring is performed at the file system level. This information is used to estimate the amount of data throughput required per hour, day, etc., and thus whether the bandwidth of the current connection will be sufficient for the predicted data replication activity. For instance, to see how much data will be replicated for an Exchange Server during each workday or for the whole week, you can start monitoring all folders used by the Exchange Server (stores, logs etc.) After 24 hours or a week, you can check the size of data modified, and use that information to estimate bandwidth requirements.

Replication Prediction reports the following for each monitored folder, volume, or file system:

  • the monitoring interval -- start and end time
  • the size of the data changed, in bytes and MB

For step-by-step instructions, see Perform Replication Prediction.

Replication Logs

CDR maintain logs on the computer, logging all file write activity (new files and changes to existing files) involving the directories and volumes specified in the source paths of all the Replication Pair(s) on that computer. These replication logs are transferred to the destination computer and replayed, ensuring that the destination remains a real-time replica of the source. For more information, see Replication Logs.

Throttling

Throttling enables you to monitor and control the data replication activities. It also allows you to configure the rate of data transfer over the network, based on the throttling parameters. The various throttling options (including throttling amount and rules) can be configured. For more information, see Throttling.

Orphan Files

Files that are in the destination directory, but not the source directory, are orphan files. You can choose to ignore, log, or delete such files that are identified in the destination path; these settings are configured in the Orphan Files tab of the Replication Set Properties.

To configure Orphan File settings, see Configure Orphan File Processing for step-by-step instructions.

To view Orphan Files, see View Orphan Files for step-by-step instructions.

Things to Consider

  • A file that is created on the source and is then deleted before it has been replicated, will still be created on the destination and then deleted. This is because both the creation and deletion of the file are captured in the log file, and this will be replayed on the destination computer. These are not treated as Orphan Files.
  • A renamed file will be replicated to the destination as a new file. The previous copy with the old name will remain on the destination and be treated according to your Orphan Files settings.
  • If you change the orphan file settings for an existing Replication Set, the change will only affect Replication Pairs that are created after the change, or Replication Pairs that are aborted and restarted. Currently active Replication Pairs will not be affected by the change until they are aborted and restarted.
  • It is strongly recommended that you do not replicate to the root of the destination filer or the filer volume. If for any reason you need to replicate to the root of the volume then ensure that the Orphan File Processing is turned off from the Replication Set Properties.

Data Replication Monitor

Replication is a continuous activity and details of on-going data replication activity is shown in the Data Replication Monitor in the CommCell Console. The process of starting data replication with CDR involves several job phases, as follows:

  • Baselining
  • SmartSync
  • Replication

For more detailed information about Job Phases and Job States, see Monitoring Data Replication.

All other job-based activity, such as Recovery Point creation, is reflected in the Job Controller. See Controlling Jobs in Job Management for comprehensive information.

Out of Band Sync

In cases where large amounts of data must be transferred from the Source computer to the Destination computer during Baselining, but the connection between the source and the destination is constrained, such as a slow WAN connection, you may not want to begin replication using the Baselining Phases. You may prefer, for instance, to back up the source and restore it to the destination to effect the initial transfer of data.

To perform the initial transfer of data without using baseline, see Out Of Band Sync from the Replication Set for step-by-step instructions. After the transfer of data, start the Replication Pair with Start, so that only the data that is new or modified since the backup will need to be replicated.

Replicate the Destination Data Back to the Source Computer (Windows Only)

It is recommended that you keep the following in mind when performing the replicate data back to the source computer:

  • If data has been damaged on the source computer, perform a Copyback from the Live Copy on the destination, without Overwrite existing data... selected. See Copy Back File System Data from a Recovery Point or the Live Copy.
  • In a case of failure of the source computer, the Replication Pair(s) can be aborted, and the data on the destination computer can be used as the primary data set. Once the problem is solved on the original source computer, the Replication Pair(s) can be created in reverse, replicating the new and modified data back to the source computer, using Smart Re-Sync.

    To limit the replication to only the data newly created or modified on the replica while it was being used as the production data set, you must save the current USN (Unique Sequence Number) on the destination volume(s) before actually using them as the production data set. This will ensure that when you start the Replication Pair later to replicate data back to the source computer, CDR can use Smart Re-Sync, beginning from the USN that was saved.

To Replicate the Destination Data Back to the Source Computer, see Replicate the Destination Data Back to the Source Computer for step-by-step instructions.

Important Considerations

It is recommended that you keep the following in mind when performing data replication:

General

  • Ensure that the destination volume has sufficient space for all the data that will be replicated to it. If you are replicating data from multiple source volumes to the same destination volume (Fan-In), ensure that the destination volume is sufficiently large for the data which will be replicated from all the source volumes. If you are creating Recovery Points, you must also account for the space requirements of the snapshots that will be created on the Destination; see Recovery Points - Snapshot space requirements.
  • Individual failed files or folders will not necessarily fail the replication job. Such individual failures may just be logged and the data replication job will continue. Check the logs periodically for such failures. See View the Log Files of an Active Job. In some cases, the nature of such failures during replication may have an underlying cause which would in turn cause CDR to switch to SmartSync, or Abort replication altogether.
  • In a case of failure of the source computer, the data on the destination computer can be used temporarily as the primary data set. Once the problem is solved on the original source computer, the new and modified data can be replicated from the destination computer back to the source computer. For more information, see Replicate the Destination Data Back to the Source Computer.
  • If a SAN volume that is a source for any Replication Pair(s) is disconnected and re-connected again, you must abort and restart at least one of the Replication Pairs on the source computer.
  • On both the source and destination computers, it is recommended that you Configure Throttling for CDR Replication Activities.
  • It is recommended that you also configure alerts. For more information, see Alerts and Notifications.

Windows

  • When you replicate data that was encrypted on the source computer, it will not be accessible on the destination computer. To access the data, you must use Copyback to recover the data to the source computer, where you will be able to access it with the proper permissions. On the source computer, if you remove the encryption from the data after it has been replicated, the data will not be replicated again, so it will remain encrypted on the destination. Encrypted files are replicated in the Baseline and the SmartSync phases.
  • The virtual memory paging file (pagefile.sys) must be configured on a local, fixed disk.
  • When using QSnap, you may want to increase the minimum size of QSnap's COW cache beyond the default size, on both the source and destination computers, if sufficient space is available. Also, you may want to select an alternate location for the COW cache. For more information, see QSnap - Cache Considerations for ContinuousDataReplicator.
  • When replicating application data, see Change Account for Accessing Application Servers.
  • When using VSS or QSnap on a source computer it is recommended that you also see Configuring Space Check for ContinuousDataReplicator Agents and Configuring Alerts for Low Disk Space to provide warning that the source computer is running out of disk space, which will ultimately cause replication activity to be System Aborted.
  • If Windows compression is set on root level of a driver letter, the compressed files will be replicated to destination as uncompressed files.

Unix

  • ACLs for AIX 5.3 cannot be replicated to a destination running AIX 5.2, as the ACL format is not backward compatible. However, ACLs from AIX 5.2 can be replicated to a destination running AIX 5.3.
  • Sparse files attributes are not transferred during the Baselining and SmartSync phases; the files assume the attributes of regular files on the destination. During the Replicating phase, sparse files do retain their attributes on the destination.
  • To use QSnap, before you can begin creating Replication Sets and Replication Pairs, you must first configure source and/or destination volumes as CXBF devices. For more information, see QSnap for ContinuousDataReplicator.
  • To replicate files with non-ASCII character names, perform the procedure detailed in Configuring the Locale for Non-ASCII Characters.

Cross Platform Replication

Cross Unix platform data replication is now supported. For example, you can replicate data from a AIX source computer to a Solaris destination computer, or Solaris to Linux, etc. However, ACLs and Extended Attributes will be lost.

Additional Setting for Data Replication

Use the following Additional Setting to modify the default behavior of the Data Replication:

Topic Registry Key(s) Description
Access Control  Files nDoNotReplicateACLs For Windows, the nDoNotReplicateACLs registry key can be used to disable the replication of the security stream of files. This stream includes user and group access control list (ACL) settings for file access. If this registry key is not present, ACLs will be replicated.