Rsync: Enough Rope...

by Steven J. Owens (unless otherwise attributed)

Rsync can be a complex topic. This tutorial is mainly some examples of how to use rsync in a limited, fairly safe, reliable way.

I'm not sure rsync really needs to be as complex a topic as it currently is. It may just be that somebody needs to take a metal file to the user interface and remove some of the rough edges, make it a bit less accident-prone. However, I've managed to learn how to do some basic tasks, so here's a quick tutorial on the "shallow end" of the rsync pool.

This was a decent tutorial on the more complicated stuff:

If you don't know what rsync is for, go google it. Well, okay, a quick summary: rsync is for synchronizing files, e.g. for making sure a file or set of files on machine A is identical to the set on machine B. rsync was originally developed to minimize use of bandwidth across network connections, hence the "r" for "remote".

The key point is that rsync does a "block by block" file comparison. That means it's looking at the file chunk by chunk, calculating checksums of the chunks (blocks) on both the local and remote version, and comparing the checksum results to see which chunks have changed. This minimizes the amount of data that the rsync program has to upload. So if you have, say, a 1 gigabyte file, and only a few kilobytes have changed, you only have to send those few kilobytes over.

These days, with the huge disks we have and the large and complex file hierarchies we use, I sometimes find rsync useful for local file manipulations. Sometimes for backing up stuff from one drive to another. Sometimes just for moving a large file or hierarchy around: rsync from original location to new location, then remove the old location. Using rsync means I can interrupt the process if I have to; the original is still intact and usable, and I can resume the process later without having to start over again from scratch.

Using rsync as a better cp or mv

Rsync is quite tricksy to get right, but basic use of it as a better version of cp/mv/scp can be mastered by always using directories for arguments and always leaving a trailing backslash on the arguments. For example:

$ rsync -avz /olddrive/home/ /newdrive/home/

The trailing backslash tells rsync to recurse and do various things, but it gets quite maddening if you don't have both old and new with trailing backslashes, as it's fairly counterintuitive whether it will create a new subdirectory or not, put the created directory inside the existing directory, etc.

If you leave off the trailing backslash, like this:

$ rsync -avz /olddrive/home/ /newdrive/home

If the destination directory ("home") already existed and you were just updating it, you now have a hairy mess to sort out. You end up with a destination directory that looks like this:


Using rsync as a better scp

You can also use rsync like ssh/scp by adding the username and domain name:

$ rsync -avz /olddrive/home/

This turned out to be fairly easy to do. I didn't need to get an rsync daemon running on the receiving machine or anything. I just needed the the receiving machine to have the rsync package installed and to be running openssh server for incoming connections. This uses ssh to run the rsync command on the remote host and use the remote rsync to receive the data that the local rsync is sending via ssh.

Using rsync with a single file

$ rsync -avz /olddrive/home/porn.html

The -avz options and the -n (dry run) option

You'll note that all of my examples use -avz.

The "-a" option is for "archive", which the rsync man page says "ensures that symbolic links, devices, attributes, permissions, ownerships, etc. are preserved in the transfer." Archive also implicitly means to use recursion, so all subdirectories and files under the source are sync'd.

The "-z" flag tell rsync to use compression when sending the data, to save bandwidth.

The "-v" flag is the easiest one to explain; it's for "verbose". This is a bit spammy to use, but I like to know everything that's going on. In fact, I quite frequently run my rsync commands twice, once to do the rsync and again to make sure everything went well (it just compares files, then tells me it didn't have to actually copy things).

Using rsync to compare but not actually alter

Speaking of running rsync without actually doing anything, the "-n" or "--dry-run" option runs rsync but doesn't actually change anything. This can be useful for testing your rsync command.

$ rsync -avz --dry-run /olddrive/home/porn.html

Using rsync to back up your drive

There are supposedly better solutions for this, but every time I've started digging on it, I end up giving up in exhaustion a while later. So, with the kind and gracious help of user mohero on freenode, I cobbled together some scripts to backup to a USB drive, via rsync.

One problem is that apparently running rsync on the entire file system can run out of memory. I've seen this problem before, but that was on a fairly low-memory machine.

But, in any event, I wanted to rsync my partitions separately, because some of them see a lot of change and others (especially the largest, /mcgee, the bulk data partition) see few changes.

Also, I had to avoid the chicken-and-egg problem of the rsync script trying to back up the USB drive it was rsyncing to, which was, of course, mounted under /.

I ended up with a directory /home/puff/backupscripts, containing three backup scripts and three exclude files. The second and third exclude files are currently empty, but after going around and around trying to figure out how to get exclude working properly, I decided to leave them there so I can easily add exclusions later.

Note: The problem I had with exclude was getting rsync to not try to rsync the USB drive. We ended up solving that with the "-x" option, whcih tells rsync not to recurse across filesystem boundaries.

Making A List of your Apt Packages

Before I backed up, I made sure to get the list of my currently installed packages:

$ sudo dpkg --get-selections > ~/backupscripts/oldpackages.txt

I could have dug around in the backup of my root partition to get these, but this is a bit easier to reference, later. I'm not going to mass-install the packages from this list, I'm just going to use it to see what packages I used to have installed, and selectively reinstall them.

My Rsync Backup Scripts
# first / without /home, /media or /mcgee (the bulk directory)
echo "Doing: sudo rsync -v -aHx --numeric-ids --delete -exclude-from=/home/puff/backupscripts/excludes1.txt --delete-excluded / /media/disk/backups/redbitter/rootdir/ 1>&2"
sudo rsync -v -aHx --numeric-ids --delete -exclude-from=/home/puff/backupscripts/excludes1.txt --delete-excluded / /media/disk/backups/redbitter/rootdir/ 1>&2

Note that the excluded paths are all on a line by themselves, with a dash (-) and a space at the beginning of the line.

- /home
- /mcgee
- /media
- /dev
- /sys
- /proc
# now /home
echo "Doing:  sudo rsync -v -aHx --numeric-ids --delete -exclude-from=/home/puff/backupscripts/excludes2.txt --delete-excluded /home /media/disk/backups/redbitter/homedir/ 1>&2"
sudo rsync -v -aHx --numeric-ids --delete -exclude-from=/home/puff/backupscripts/excludes2.txt --delete-excluded /home /media/disk/backups/redbitter/homedir/ 1>&2

# now /home
echo "Doing:  sudo rsync -v -aHx --numeric-ids --delete -exclude-from=/home/puff/backupscripts/excludes2.txt --delete-excluded /home /media/disk/backups/redbitter/homedir/ 1>&2"
sudo rsync -v -aHx --numeric-ids --delete -exclude-from=/home/puff/backupscripts/excludes3.txt --delete-excluded /mcgee /media/disk/backups/redbitter/mcgee/ 1>&2


See original (unformatted) article


Verification Image:
Your Email Address:
Confirm Address:
Please Post:
Copyright: By checking the "Please Post" checkbox you agree to having your feedback posted on notablog if the administrator decides it is appropriate content, and grant compilation copyright rights to the administrator.
Message Content: