Amazon S3? Not yet ready for me!

I’ve been thinking for a while how to properly keep back-ups of all of my data while, at the same time, saving a few bucks. Since the “cloud computing” term is now floating all over the Internet, I thought that a distributed, remote back-up service might do the work for me.

I looked around and found quite different services, but most of them offer ridiculously small storage size, like 5GB, or force me into using sub-par Web-based user interfaces that make using rsync complicated or unfeasible. I’m looking services that offer 2TB+ storage and, so far, the only solution that I find promising is Amazon S3. The problem is price. Keeping 2,048GB of data stored in Amazon S3 costs me about $300 USD per month, plus a one-time cost for uploading the data. At that price, for a whole year, I can buy a QNAP TS-809 filled with 8 x 1.5TB disks 🙂

So, unfortunately for me, multi-terabyte back-up copies to the Internet are still to expensive. Perhaps, in 5 years, technology will drive prices down such as that I can afford to keep my back-ups on the Internet.

Incremental backups with rsync

I have been thinking for a while to implement incremental, cyclical backups on my home network. The problem with cyclical backups to tape is that they are slow. The problem with cyclical backups to disk is that they consume a great deal of space. I finally opted for cyclical backups to disk since my DDS-3 SCSI tape is slow and can’t hold the many gigabytes I have in data, even with hardware/software compression.

I want to periodically branch my main backup tree so that I can keep several backups, ordered from the newest (backup.0) to the oldest (backup.n), where “n” could be the number of days or weeks, depending on the frequency of the backups.

The filesystem should look like this:

-- backup.0
 |
 |- backup.1
 |
 |- backup.2
 |
 .
 .
 .
 - backup.n

A simple way to reduce disk space usage is by using a UNIX-like feature called hard-links. The idea behind this is that if a file does not see its contents changed between backups, we could save space by having all the identical copies hard-linked together.

Using rsync and cp we can implement this very easily, thanks to the way that rsync works. By default, when not using the –inplace command-line switch, if rsync detects that a destination file is different from its source file, instead of performing direct modifications onto the destination file by opening it, writing to it, then closing it, rsync will create a new file. This has several advantages:

  1. Users can keep on working with files, even when rsync is synching them underneath. Since rsync always creates a new file instead of performing modifications to the current file, users won’t suffer from the strangeness that involves multiple updates to the same file by multiple users/processes.
  2. Since rsync creates a new file, when the original destination file is hard-linked across several backup branches, the synching process won’t indirectly sync up those backup branches too. Instead, they will be kept intact, and a new destination file, mirroring its source file, will be created.

    We don’t want that an update to a file in the backup.0 branch means updating any file hard-linked to it, since that would destroy the incremental semantics.

Thus, we can implement a really simple cyclical backup scheme using rsync and hard-links..

  1. Things to run on the server.

    We run this periodically:

    # rm -fr backup.${n}
    # for i in `seq ${n} -1 2`; do mv backup.$[${i}-1] backup.${i}; done
    # cp -al backup.0 backup.1

    This will rotate all the backups, discarding the last one. Then, the cp command will replicate the main branch (backup.0) into (backup.1) by using hard-links.

    NOTE for FreeBSD users: the cp command that comes with the FreeBSD base system does not support neither the -a nor the -l command-line switches. -a means -dpR (recursively copy and preserve attributes), while -l means not to copy, but to create hard-links instead.

    Fortunately, the FreeBSD ports collection includes a port of the GNU coreutils package, which sports the full GNU cp program, supporting the -a and -l switches:

    # cd /usr/ports/sysutils/coreutils
    # make all install

    To avoid the name clashing betweeh the cp command from the FreeBSD system and the GNU one, the GNU cp command is renamed to gcp. So, in the script listed bedore, we should rename the invocation to cp to gcp.

  2. Things to run on the client.

    To perform the incremental backup against the server, we can run the following command:

    # rsync -a -E Users/ rsync://:/data/backup.0/

    It’s very important to keep the timestamps synchronized on both the client and the server so rsync can use them to decide which files have been changed and which files not. This is done with the -t command-line switch. Note that the -a (archive) command-line switch to rsync is like specifying -rlptgoD, and thus we don’t have to specify -t.

    The -E command-line switch is useful for Mac OS X-based machines and will allow synching files stored in a HFS+ volume that uses resource forks by using the AppleDouble format.

Backup fails with an error message when trying to perform a backup

iBackup has problems backing up folders to iDisk if those folders’ icons have been customized.

BRIEF: It’s a problem caused by folders with customized icons.

DETAILED: Sometime ago, I customized the icons for the “Library” and “Documents” folders as I didn’t like the default ones – I took the icons from the wonderful World of Aqua set of icons. When you customize the icon of a folder, a zero-sized file named “Icon?” (or Icon^M when viewed from the command line) is created inside that folder.

It seems that iBackup doesn’t cope well with this kind of files, at least when backing up to the iDisk. When the folders being backed up have a customized icon (a file named Icon?), iBackup will complain with the following error message:

“/Users/falfaro//Icon” not found on iDisk. This file does not exist on the iDisk.

SOLUTION: Remove the offending “Icon?” file from the folder in question and then retry the operation.