Theory
As the name suggests, rdiff-backup makes reverse incremental backups. What's that? Let's look at** incremental backup**s first. Incremental backups work by taking a snapshot of a file on the first run, and storing only the differences (diffs) on subsequent runs. Unlike a traditional copy, you only have one copy of the whole file, and have many diffs that describe how it looked at each run. This makes subsequent backups run quicker, take much less space, and still keep the ability to restore any version of the file.
A reverse incremental backup is almost the same. It still only stores one copy of the file, but it is always the latest version that is stored, and you keep the diffs that allow you to go back to an earlier revision. This is tailored to the use case when you accidentally overwrite, or delete a file on the production server. Instead of taking the original file from the backup, and applying N diffs to it, you can just simply copy the file back, because you have the latest version as a whole.
Usage
Search for rdiff in your package manager (aptitude, pacman, yum, etc.), and install it.
Let's create a project where we can try it out. I'm going to create two folders, a "website", that houses the project that we want to back up, and a "backup" folder, that stores the backup(s).
mkdir -p ~/backups ~/website/htdocs cd ~/website/htdocs echo "Hello world" > index.html
Taking a backup is as easy as specifying the source, and the destination folder:
rdiff-backup ~/website ~/backups``` That's all, you now have a backup of your whole site in ~/backups/. Let's add an imaginary forum to our site, link to it in the frontpage (index.html), and then take another backup. ```bash cd ~/website/htdocs/ echo "Here is the forum" > forum.html echo 'Click <a href="/forum.html">here</a> for the forum' >> index.html rdiff-backup ~/website ~/backups
If you take a look at ~/backups/htdocs you can see that the file structure nicely mirrors the website. Now let's delete the forum.html file, and take another backup:
rm forum.html rdiff-backup ~/website ~/backups
The forum.html also vanished from the ~/backups/ folder. That's because we only keep the latest version, for every older version we will have to do a little work for restoring.
A few use cases
Now that we did a few changes, namely creating a new file, modifying an existing one, and deleting one. Let's look at a few scenarios at how you might have to restore something.
Restoring a file, or a directory
If you need to restore the latest state of the file, eg. after an accidental delete, you can just copy back the file from the backup folder, to the production folder. Use mc, cp, anything that you prefer, nothing else to do here.
Restoring an older version of a file
If you need an older version, first you need to figure out the date that file was in the state that you need. You can list all the increments (the times when rdiff-backup saw a change to a file) with the -l switch, and restore a specified version with -r:
cd ~/backups/htdocs/ rdiff-backup -l index.html # Found 1 increments: # index.html.2011-10-20T18:34:44+02:00.diff.gz Thu Oct 20 18:34:44 2011 # Current mirror: Thu Oct 20 18:39:18 2011 rdiff-backup -r '2B' index.html /tmp/index.html cat /tmp/index.html # hello world
You can restore both files and directories like this. 2B means to restore the file as it looked like 1 change ago. 3B means 2 changes ago, and so on, n-1 changes ago. Take a note at the TIME FORMATS section in the manual, it describes more ways to restore a file (there are 6 ways). Also, the manual is wrong on the B notation:
List changes that have happened since a given time
You can use --list-changed-since [time], to list all the changes that have happened after a certain time. You can also use this to restore deleted files, that are no longer present in the backups directory, because another backup has taken place. You have to find the filename with --list-changed-since, find out the last increment date with -l, and restore it with -r. It will work as expected, even though the file seems to be missing.
Comparing the production folder to the backup
You have made some changes to the site, but forgot which files you changed, or you just need to make sure that the backup is fresh. You can use the --compare option for this:
echo "A new change" >> index.html rdiff-backup --compare ~/website/ ~/backups/ # changed: htdocs # changed: htdocs/index.html
Keep in mind that this comparison also takes mtime into account, so if the files are perfectly identical, but have different mtimes, rdiff-backup will report it as changed. You can also use --compare-at-time to specify the time you wish to compare against.
Recovering a file, without knowing the file name
You have deleted a file three months ago, and now it turns out that you need it. Of course, you no longer remember the name of the file. If you can at least guess what the filename was, you can use find to try and match an increment file with find:
find . -iname '*forum*' -type f # ./rdiff-backup-data/increments/htdocs/forum.html.2011-10-16T21:10:01+02:00.snapshot.gz # ./rdiff-backup-data/increments/htdocs/forum.html.2011-10-20T18:34:44+02:00.missing # ./rdiff-backup-data/increments/htdocs/forum.html.2011-10-16T21:05:59+02:00.missing
Another option, if you know some of the contents of the file, you can try grepping the increments folder. Even though the increment files are gzipped, you can usually find the strings in it:
grep -ir 'forum' . # Binary file ./rdiff-backup-data/increments/htdocs/forum.html.2011-10-16T21:10:01+02:00.snapshot.gz matches
If you don't remember anything, you need to look through the increments folder, under rdiff-backup-data. It contains all of the files that rdiff-backup has encountered so far, even the deleted ones, so you can look up the filename there. For example, I have deleted forum.html, but it still exists in the increments folder:
$ tree ~/backups/rdiff-backup-data/increments ~/backups/rdiff-backup-data/increments ├── htdocs │ ├── forum.html.2011-10-16T21:05:59+02:00.missing │ ├── forum.html.2011-10-16T21:10:01+02:00.snapshot.gz │ ├── forum.html.2011-10-20T18:34:44+02:00.missing │ ├── index.html.2011-10-16T21:05:59+02:00.diff.gz │ ├── index.html.2011-10-16T21:10:01+02:00.diff.gz │ └── index.html.2011-10-20T18:34:44+02:00.diff.gz ├── htdocs.2011-10-16T21:05:59+02:00.dir ├── htdocs.2011-10-16T21:10:01+02:00.dir └── htdocs.2011-10-20T18:34:44+02:00.dir
Taking it further
You can automate this. Drop the command into cron.daily, into your own crontab, or create a user just for handling backups, which is in my opinion, the best solution. Start thinking about offsite backups now. Chances are, if your site gets hacked, or the HDD breaks down, you will not be able to access your backups, so always have them copied to a different place. You can mount the remote site with sshfs, transfer the backups with scp, tell rdiff-backup to connect through ssh, or you can install rdiff-backup on the remote side, and launch it in daemon mode.
You will also have to monitor the disk usage. Since we are only storing the diffs, we are very space efficient, but if your site has any user generated content, like uploaded files, forum avatars, those can get out of hand quickly. You can remove old versions with --remove-older-than [time]. Skim through the manual at least once, to know all the options that are available.