Incremental backups with rsync

I have been thinking for a while to implement incremental, cyclical backups on my home network. The problem with cyclical backups to tape is that they are slow. The problem with cyclical backups to disk is that they consume a great deal of space. I finally opted for cyclical backups to disk since my DDS-3 SCSI tape is slow and can’t hold the many gigabytes I have in data, even with hardware/software compression.

I want to periodically branch my main backup tree so that I can keep several backups, ordered from the newest (backup.0) to the oldest (backup.n), where “n” could be the number of days or weeks, depending on the frequency of the backups.

The filesystem should look like this:

-- backup.0
 |- backup.1
 |- backup.2
 - backup.n

A simple way to reduce disk space usage is by using a UNIX-like feature called hard-links. The idea behind this is that if a file does not see its contents changed between backups, we could save space by having all the identical copies hard-linked together.

Using rsync and cp we can implement this very easily, thanks to the way that rsync works. By default, when not using the –inplace command-line switch, if rsync detects that a destination file is different from its source file, instead of performing direct modifications onto the destination file by opening it, writing to it, then closing it, rsync will create a new file. This has several advantages:

  1. Users can keep on working with files, even when rsync is synching them underneath. Since rsync always creates a new file instead of performing modifications to the current file, users won’t suffer from the strangeness that involves multiple updates to the same file by multiple users/processes.
  2. Since rsync creates a new file, when the original destination file is hard-linked across several backup branches, the synching process won’t indirectly sync up those backup branches too. Instead, they will be kept intact, and a new destination file, mirroring its source file, will be created.

    We don’t want that an update to a file in the backup.0 branch means updating any file hard-linked to it, since that would destroy the incremental semantics.

Thus, we can implement a really simple cyclical backup scheme using rsync and hard-links..

  1. Things to run on the server.

    We run this periodically:

    # rm -fr backup.${n}
    # for i in `seq ${n} -1 2`; do mv backup.$[${i}-1] backup.${i}; done
    # cp -al backup.0 backup.1

    This will rotate all the backups, discarding the last one. Then, the cp command will replicate the main branch (backup.0) into (backup.1) by using hard-links.

    NOTE for FreeBSD users: the cp command that comes with the FreeBSD base system does not support neither the -a nor the -l command-line switches. -a means -dpR (recursively copy and preserve attributes), while -l means not to copy, but to create hard-links instead.

    Fortunately, the FreeBSD ports collection includes a port of the GNU coreutils package, which sports the full GNU cp program, supporting the -a and -l switches:

    # cd /usr/ports/sysutils/coreutils
    # make all install

    To avoid the name clashing betweeh the cp command from the FreeBSD system and the GNU one, the GNU cp command is renamed to gcp. So, in the script listed bedore, we should rename the invocation to cp to gcp.

  2. Things to run on the client.

    To perform the incremental backup against the server, we can run the following command:

    # rsync -a -E Users/ rsync://:/data/backup.0/

    It’s very important to keep the timestamps synchronized on both the client and the server so rsync can use them to decide which files have been changed and which files not. This is done with the -t command-line switch. Note that the -a (archive) command-line switch to rsync is like specifying -rlptgoD, and thus we don’t have to specify -t.

    The -E command-line switch is useful for Mac OS X-based machines and will allow synching files stored in a HFS+ volume that uses resource forks by using the AppleDouble format.


5 thoughts on “Incremental backups with rsync

  1. Pingback: Felipe Alfaro Solana » Microsoft y Extremadura

  2. Pingback: TalSoft TS » Blog Archive » Backups incrementales y auditoría de sistemas de ficheros con rsync

  3. I think this is one of the most vital information for me. And i am glad reading your article. But want to remark on few general things, The website style is perfect, the articles is really great : D. Good job, cheers

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s