Content, Filters, and Regular Expressions - Backup

Table of Contents

Overview

Data protection content is defined at the subclient level for File System iDataAgents. The default subclient, created during installation, has the unique characteristic of including all protected data not explicitly covered by other subclients within the backup set. Additional subclients can be created by the administrator to separate and manage a subset of the backup set data.

The subclient behavior is different for Windows and UNIX.

Windows
  • The default subclient skips the content of all user-defined subclients within the backup set.
  • The user-defined subclient skips the content of other subclients within the backup set with one exception; if the content defined on a subclient matches with the content defined using wildcards on any other subclient, the content will not be skipped.
UNIX
  • The default subclient skips the content of all user-defined subclients within the backup set.
  • The user-defined subclient does not skip the content of other subclients within the backup set.

In the case of application subclients, the inclusive feature of the default subclient means it can automatically "discover" all data requiring its protection. In some iDataAgents, this automatic discovery feature can be disabled, filtered (for example: Lotus Notes), or otherwise configured to assign content to other subclients (for example: Exchange mailboxes and GroupWise databases).

File System Content

In File System iDataAgents, the default subclient content is defined as "\" or "/" which is the symbol for the root of all file systems. In this case, "all file systems" means:

  • All local file systems excluding those file systems on read-only devices such as CD, DVD or tape drives.
  • Windows mounted file systems are included by default, but can be excluded by job in the Advanced Backup Options by clearing the Follow Mount Points option.
  • NFS-mounted file systems are excluded by default, but can be included by specifically listing the mount points in the subclient content.
  • Raw character or block devices are not included by default but can be included in the subclient content by explicitly adding the device name.
  • Network shares can be explicitly included by adding their UNC data paths and a user/password with read access authority.
  • Data in Remote Storage (that is, data managed by Microsoft's Remote Storage Service) can be included by selecting the backup option to do so.

In most cases, manually editing the content of the default subclient will disable its inclusive discover feature. Adding user-defined content to a subclient should be done in subclients other than the default subclient.

Adding System Folders for Windows File System to a Subclient

The system folders such as My Documents, Desktop, Music etc. are created by default for each user. If you want to include all such folders in a subclient, add following paths as a subclient content:

Path Folder
%Documents% My Documents
%Desktop% Desktop
%Music% Music
%Pictures% Pictures
%Videos% Videos
%Dropbox% Dropbox Folder - This folder is created automatically when you install the Dropbox application

You can also use a Qcommand to create the subclient and add the content to it. For more information, refer to Command Line documentation.

Using Wildcards to Define Content

Regular expressions (or wildcard characters) can be used to define content in a subclient. Wildcard expressions are characters such as * or ?. Regular expressions include patterns such as [a-f] or *.[l-n]df. The terms are interchangeable.

You can use the regular expressions for defining content at any level in the data path. For example: F:\Users\[A-L]* or *.pst.

Defining Filters

Filtering unnecessary data from data protection operations can reduce backup/migration time, storage space, and recovery time. Most, but not all iDataAgents, include some filtering capability at the subclient level. Subclient filters for File System Agents can be defined for a path, directory, or file level.

Some application iDataAgents such as Exchange Mailbox, Lotus Notes Database, and Lotus Notes Document also allow filters to be defined.

There are four basic types of filters that can be used:

Exclusion Filters

Exclusion filters can be defined at the subclient level and exclude data from being unnecessarily protected.

  1. From the CommCell Browser, expand Client Computers > client > File System > backup_set.
  2. Right-click the backup_set, point to All Tasks, and then click New Subclient.

    The Subclient Properties dialog box appears.

  3. Click Advanced.

    The Advanced Subclient Properties dialog box appears.

  4. In the Exclude these files/folder/patterns section, click Add.

    The Enter Path dialog box appears.

  5. Enter the paths to exclude and then click OK.

    Tip: Each directory must be on a separate line.

  6. Click OK to close the Enter Path dialog box.
  7. Click OK to close the Advanced Subclient Properties dialog box.
  8. Click OK to close the Subclient Properties dialog box.

Exception Filters

Exception filters can only be defined at the subclient level for supported agents. An Exception filter allows you to define directory or file exceptions to a filter defined in the exclusion section. For example, you can "exclude" the C:\ProgramData with the "exception" of C:\ProgramData\Documents directory.

  1. From the CommCell Browser, expand Client Computers > client > File System > backup_set.
  2. Right-click the backup_set, point to All Tasks, and then click New Subclient.

    The Subclient Properties dialog box appears.

  3. Click Advanced.

    The Advanced Subclient Properties dialog box appears.

  4. In the Except these files/folder/patterns section, click Add.

    The Enter Path dialog box appears.

  5. Enter the paths to exclude and then click OK.

    Tip: Each directory must be on a separate line.

  6. Click OK to close the Enter Path dialog box.
  7. Click OK to close the Advanced Subclient Properties dialog box.
  8. Click OK to close the Subclient Properties dialog box.

Subclient Policy Filters

Subclient Policy filters are made up of Exclusion and Exception filters, and applied by a Subclient Policy. You can choose to not include Subclient Policy filters in a subclient and thereby define your own Exclusion and Exception filters for each subclient. See Filtering Rules for an overview.

Global Filters

Global Filters are made up of Exclusion filters that can be defined for each operating system such as Windows, Unix, and NetWare, and for the Exchange application. Both Linux and Macintosh operating systems fall under the Unix global filter. Filters defined at global level can be included or excluded in a subclient filter definition.

  1. From the CommCell Browser, navigate to Client Computers | <Client> | <Agent>.
  2. Right-click the <Subclient> in the right pane and click Properties.
  3. Click the Filters tab and clear Include Policy Filters.
  4. From Include Global Filters, select ON.

    This will allow filters set up under Control Panel (Global Filters) to be used for excluding data from backups for the selected subclient.

    These filters are used in addition to the filters set at the subclient level.

  5. Click OK.

See Global Filters - Online Help for an overview.

Adding All Filters to Global Filters as a Group

The CommCell Console does not support the group addition of Global Filters. However, you can use Microsoft SQL Server to add them by editing the CommServ.dbo.GXGlobalParam table directly.

Direct modification of the metadata is not supported. Before you attempt any modification of the CommServ database, perform a Disaster Recovery backup and ensure that it can be recovered successfully.

The CommServ.dbo.GXGlobalParam table has two fields: name and value.

The Windows global filters are defined by the name "Windows FS Exclude Filters". The name is not present unless a global filter entry has been previously created.

Before continuing, add a global filter entry and click OK. The procedure below will overwrite the entry.

To add a group of global filters you can run an SQL statement for example:

UPDATE GXGlobalParam set value=<filter list> WHERE name='Windows FS Exclude Filters'.

The <filter list> is a string of filter entries separated by a space for example:

C:\Temp D:\Tmp E:\Junk.

If the filter entry itself contains a space as

C:\Program Files\*

then those spaces are replaced by "+1" for example:

C:\Program+1Files\*

Use the following steps to construct a string of filters:

  1. Copy the recommended filter list below into Microsoft Word or enter your own set of filters. Each filter should be plain text on a separate line with no following spaces.
  2. Use Find and Replace to look for spaces in the filter text and replace them with "+1". Clean up any extra +1's at the end of each line.
  3. Use Find and Replace to look for the special character "Manual Line Break" and replace it with a space.
  4. Insert "UPDATE GXGlobalParam SET value = '" at the beginning of the text string and "' WHERE name = 'Windows FS Exclude Filters'" at the end.

    Save the document as an unformatted text document.

  5. Open SQL's Query Analyzer and set it to use the CommServ database. Open the file created in step 4 and execute it.
  6. Open the Global Filters dialog in the CommCell Console and verify the filters you added.

Here's a sample of the SQL script:

UNIX Filters

For Unix, a single asterisk (*) will only match one level and two asterisks (**) will match all levels.

Example 1

Example 2

Only files beginning with "cl" will be backed up (under ALL directories) in /dir, /dir1 and /dir2.

Example 3

All files under /dir will be backed up.
Only files beginning with "cl" under /dir1 will be backed up (under ALL directories).
All files under /dir2 will be backed up.

Example 4

Only file names beginning with "cl" under /dir1/dir1 and /dir1/dir2 will be backed up.

Editing Filters

Once data protection filters have been defined, entries can be added or deleted from the Subclient Properties (Filters) tab. In addition, some agents provide an editing capability that allows modification to existing filters without having to delete and re-add them.

Delete Exclusion Filter

  1. From the CommCell Browser, navigate to Client Computers | <Client> | <Agent>.
  2. Right-click the <Subclient> in the right pane and click Properties.
  3. Click the Filters tab.
  4. Click Delete button on the right of Exclude these files/folder/patterns.

    This will delete the path set in both Exclude these files/folder/patterns and Except for these files/folders.

  5. Click OK.

Delete an Exception Filter

  1. From the CommCell Browser, navigate to Client Computers | <Client> | <Agent>.
  2. Right-click the <Subclient> in the right pane and click Properties.
  3. Click the Filters tab.
  4. Click Delete button on the right of Exclude these files/folder/patterns.

    This will delete the path set in Except for these files/folders. However, the Exclusions will still remain valid.

  5. Click OK.

Using Wildcards to Define Filters

The use of wildcards or regular expressions in defining filters is supported for most iDataAgents, but support for specific wildcards can vary.

The following Wildcards are supported for the respective agent type:

Exception filters that include wildcards and that point to folders or files below an excluded folder are not supported.

Keep in mind that vendors of NAS file systems implement filtering differently. For example, BlueArc and Network Appliance systems support only the "*" character either before or after the character string (e.g., *tmp or cache*).

Inclusions, Exclusions, and Exceptions to Exclusions

Subclient content serves as an include filter that allows you to specify the portions of client data that you want included in data protection operations. The Exclude Filter is designed to tell the system which subset of the specified content should not be included in data protection operations. The following example illustrates these concepts:

Include= /space
Exclude= /space/temp

The above example shows that the volume called "space" will be included in backups for the subclient, excluding the "temp" directory. Suppose there were a subdirectory of "temp" called "keep" that you wanted to include, you could set up an Exception to the Exclusion Filter, as shown in the next example:

Include= /space
Exclude= /space/temp
Except=  /space/temp/keep

This would result in the volume "space" being included in backups, excluding the "temp" directory, except for the "temp/keep" subdirectory.

All removable media drives including Drive A and B for both Unix and Windows should be excluded since the operating system might consider them to be floppy or disk drives.

Exceptions to exclusions can be thought of as patterns that should overlap (or match the same paths) as exclusions. This is useful for protecting data that would otherwise be filtered out by the Exclude filter. The Exception filter is not intended for use as a stand-alone include filter--the objects in the Exception filter must be a subset of objects in the Exclusion Filter.

Regular expressions (or wildcards) offer additional flexibility in defining the content to be included, excluded and/or excepted. The next example illustrates how wildcards may be used in filtering:

Include= /space
Exclude= /space/tem*/**
Except=  /space/temp/keep

This example defines the content to be included as the "space" volume, excluding all subdirectories of "space" that begin with the letters "tem", except for the "temp/keep" subdirectory. Using wildcards in this way would exclude directories such as "space/temp", "space/templates", and "space/temporary_internet_files" from data protection operations. The Exception filter provides an exception to the exclusion of all subdirectories beginning with "tem", by allowing the "temp/keep" directory to be protected.

A more advanced example follows:

Include= C:\ and D:\ and E:\
Exclude= *:\**\temp*\**
Except=  C:\templates\

Drives C, D and E are the content to be included, excluding all directories that begin with "temp" from any level on any of those drives, except for the "templates" directory on drive C. This is just one example of the many possible combinations of wildcards that may be used in filtering.

Keep in mind the following rules with regard to filtering:

  • If the path matches the exclusion AND it does not match any exception, it is excluded (recursively if a directory). Wildcard exceptions are NOT matched against the file system to add content to the subclient. For example, exclude "/space/temp" (no exceptions) means do not back up "/space/temp" (recursively if it is a directory).
  • If the path matches an exclusion AND it matches an exception to exclusions, it is NOT excluded. An exclusion of "/space/tem*" does not match any of the same paths as an exception "/space/temp/keep", so the exception has no effect on what is backed up.
  • For Windows File System iDataAgents, when change journal is used to perform the scan, the same filter cannot be used to filter both a file and a directory at the same time. For example, C:\temp\pattern*\** would filter out all directories that match this pattern at a certain directory level, and C:\temp\pattern* would filter out all files that match this pattern at a certain directory level. Using classic file scan instead of change journal would allow you to use the same filter for both file and directory objects that match the same pattern.
  • For Unix File System iDataAgents, the same filter can be used to filter both files and directories at the same time. For example, "/etc/*.log" is valid for Unix to exclude both a file and a directory called "user.log".
  • We recommend avoiding the filtering of subclient content in such a way as to limit it to only a given file type, such as *.pst or *.doc files. This sort of configuration would prevent other data from being protected on that volume. Because content is mutually exclusive across subclients, no other subclient would be able to back up the excluded data--which could result in data loss.
  • Wildcards may be used only in the last level of the content path when specifying subclient content, provided that the Treat Characters as Regular Expressions option is enabled.

Testing the Phase of Backup Job Without Running a Backup

Testing the scan phase allows you to determine which files/objects would be backed up, the number of objects, time taken for scan, etc., based on your backup options and filters.

  1. Configure the subclient content and filters as desired, in a test backup set.
  2. Run a full backup in such a way that it will fail after the scan phase has completed. One way to do this is to add a post-scan or pre-backup script, on the Subclient Properties (Pre/Post Process) tab, that will exit with an error (e.g., fail_post_scan.bat with the contents exit 1).
  3. The pending job information in the Job Controller window will provide the path of the collect file which contains a list of objects scanned for backup.
  4. Navigate to the collect file on the client to see the list of files and directories that would have been backed up.
  5. After viewing the scan results, kill the suspended job from the Job Controller by right-clicking the job and selecting Kill. Click Yes to confirm.
  6. Make any changes as needed to the content and filters configuration to achieve the desired set of files, and re-run the process.
  7. Once you are satisfied with the configuration, you can apply these changes to a subclient that is used in production and remove the post-scan or pre-backup script.