Migrating Linux To New Hardware

by Steven J. Owens (unless otherwise attributed)

Migrating From Old Hardware To New Hardware

This document assumes you're using debian. This should be fairly useful for ubuntu users as well, but there may be some minor differences. I'll mention any that I know about, as I go along.

Disk Image Copy

If I literally wanted to just copy the installation over to a new set of hardware:

a) rsync -DragtopvH --exclude=/proc/ --exclude=/tmp/target /tmp/target

b) install a master boot record

c) make some minor edits and tweaks

Note that this would certainly depend on the hardware configuration being almost identical.

Rebuilding from Scratch

However, what I want is to build a fresh system, with all the old user data and etc, moved over.

The following assumes that your box is freshly installed and basic network configuration, etc, is set up. There are too many variables to document that process here, and in any event, there are a zillion "how to install" tutorials out there, this one is about migrating your old stuff to your new box.

Get root access on your freshly installed box

The particulars of this are up to your hosting arrangement, but note that security-conscious sysadmins quite routinely deny remote root login (e.g. you have to be at the console to login as root, or ssh in via some user account and then su to root). If you can't ssh in as root@yourdomain.com, a properly security-conscious sysadmin is a likely explanation.

What Release

First, make sure what release the new box is. Ubuntu has "lsb_release -a", but on debian there's no authoritative way, you just have to check various spots:

newdarksleep:~# cat /etc/debian_version
4.0
puff@newdarksleep:~$ uname -a
Linux darksleep.com 2.6.18 #2 Mon Aug 24 17:07:06 UTC 2009 i686 GNU/Linux
puff@newdarksleep:~$ cat /etc/apt/sources.list
deb http://mirrors.kernel.org/debian etch main
puff@newdarksleep:~$

The Actual Change-Over

Here's a high-level outline of the big steps in this process:

Always Back Up Before Mucking Up

I'm going to include the backup step in each task below, but in general, before messing with any important config file, make a backup, just in case. In many cases, the default config file can be restored by reinstalling the package, but why go to all that trouble?

Set up your working user account

I prefer to use sudo to do most important things, just because it helps me avoid staying logged in as root and being lazy and sloppy. Plus, I can check /var/log/auth.log to see when and who installed something.

Logged in as root, do adduser:

newdarksleep:~# adduser mig<enter>
...enter username, password, etc...

I used the username "mig" for migration. I have a regular user account on the old box, and I don't want to complicate things by editing the same account I'm using to do the editing. Probably I'm being overly paranoid, but just because you're paranoid doesn't mean they aren't out to get you!

Backup /etc/sudoers. You could probably get away with not backing it up, since we're going to edit /etc/sudoers with the visudo command, which syntax-checks your changes and will refuse to write a broken sudoers file to /etc/sudoers. But why not be paranoid?

newdarksleep:~# cp /etc/sudoers /etc/sudoers_backup_11_24_09

Used to be you could just run an editor on /etc/sudoers, but nowadays /etc/sudoers is chmodded to be not-writable, and sudo itself won't run if /etc/sudoers is writable. This doesn't matter as much when you're using the actual root account, but if you're using sudo to run the commad to edit /etc/sudoers, it creates a catch-22 and you need to jump through hoops to fix it.

Okay, so now edit sudoers and add a line to give privileges to your user.

newdarksleep:~# visudo 
----------------------------------------------------------------------
...
mig	ALL=(ALL) ALL
----------------------------------------------------------------------

Note: Ordinarily I'd use emacs to edit sudoers, not vi, but emacs isn't installed yet. More about that below.

Install Basic Amenities

Aptitude is now standard on debian, which is good for a variety of reasons that google will be happy to tell you about.

Remember, always update apt before installing.

mig@newdarksleep:~$ sudo aptitude update

You could lump the following all into a single, one-line aptitude install, and I recommend you do so. I'm going to leave them on separate lines so you can pick and choose more easily.

I'm going to assume you know why screen and rsync are a good thing, otherwise this whole topic is probably a bit too much for you to handle.

Obviously, I like emacs better than vi, for all the obvious reasons. If you like vi better, well, there's still hope for you :-), and remember, "vi vi vi == 666, QED vi is The Editor of the Beast" :-).

mig@newdarksleep:~$ sudo aptitude install screen
mig@newdarksleep:~$ sudo aptitude install emacs
mig@newdarksleep:~$ sudo aptitude install rsync

We probably don't need mutt right off the bat, but I use it, so it'll have to get installed sooner or later, and it might come in handy in the meantime:

mig@newdarksleep:~$ sudo aptitude install mutt

Adding Further sudoers

Now that we've installed the basic amenities, including a civilized text editor, we'll use emacs when editing /etc/sudoers.

If you're doing it as root you can specify the editor when you invoke visudo:

newdarksleep:~# EDITOR=emacs visudo 

On some systems (ubuntu) you can specify the entire above command when invoking visudo via sudo:

puff@redbitter:~$ sudo EDITOR=emacs visudo

However, that doesn't work with the debian version of sudo that I'm working with. Here's one fast and dirty way to do it:

mig@newdarksleep:~$ sudo bash -c "EDITOR=emacs visudo"

Migrate the user accounts and groups

Edit the following files (see below for how to edit them):

Remember, step one, back up the files:

mig@newdarksleep:~$ sudo cp /etc/passwd /etc/passwd_backup_11_25_09
mig@newdarksleep:~$ sudo cp /etc/shadow /etc/shadow_backup_11_25_09
mig@newdarksleep:~$ sudo cp /etc/group  /etc/group_backup_11_25_09
mig@newdarksleep:~$ sudo cp /etc/gshadow /etc/gshadow_backup_11_25_09
mig@newdarksleep:~$ sudo cp /etc/aliases /etc/aliases_backup_11_25_09
mig@newdarksleep:~$ sudo cp /etc/sudoers /etc/sudoers_backup_11_25_09

Use the "last" command on your old box to review the list of recent user logins and see if you want to prune any who don't login. Bear in mind that the machine may be configured with services (like squirrelmail, or domain forwarding, etc) that never require the user to actually log into the shell account, even though they use the system every day.

Don't copy the whole files, rather copy and insert the entries from the old box into the new box.

Be careful to make sure all userids and groupids are > 1000, to avoid conflicts with users/groups created by the install process (for example, the bind user).

Use the following commands to edit the files:

Or in my case, since I'm using emacs:

mig@newdarksleep:~$ sudo EDITOR=emacs vipw
mig@newdarksleep:~$ sudo EDITOR=emacs vipw -s
mig@newdarksleep:~$ sudo EDITOR=emacs vigr
mig@newdarksleep:~$ sudo EDITOR=emacs vigr -s

Remember, if sudo doesn't take your EDITOR=emacs, use:

mig@newdarksleep:~$ sudo bash -c "EDITOR=emacs vipw"
mig@newdarksleep:~$ sudo bash -c "EDITOR=emacs vipw -s"
mig@newdarksleep:~$ sudo bash -c "EDITOR=emacs vigr"
mig@newdarksleep:~$ sudo bash -c "EDITOR=emacs vigr -s"

Edit /etc/aliases normally, with emacs, but make sure to "sudo newaliases" after saving the file:

mig@newdarksleep:~$ sudo emacs /etc/aliases
...edit the file...
mig@newdarksleep:~$ sudo newaliases

Use visudo to add sudo permissions to any users you usually give special admin privileges to:

mig@newdarksleep:~$ sudo bash -c "EDITOR=emacs visudo"

Copy The Home Directories

puff@olddarksleep:~$ rsync -avz -e ssh /home/ mig@newdarksleep:/home/

One caveat here: I decided I really didn't want to do the whole migration process as root, so I created a user account first, before migrating /home. However, to make life easier, and avoid any possible issues in merging olddarksleep:/home/puff and newdarksleep:/home/puff, I create a disposable "mig" (for migration) account.

The /home directory isn't all the data; we're going to have to dig out more stuff, which I'll discuss further down.

Note: Okay, so as I'm actually going through and doing this, I noticed that, of course, I can't just rsync as mig, even though mig has full sudo privs, because the stuff in /home/ belongs to all sorts of different users. Unless I figure out some workaround to have rsync invoke mig's sudo privs, I'm going to have to enable remote login for root (mainly by just resetting the root password) and rsync /home/ as root.

Take A Snapshot Of /etc For Reference

And, of course, we're going to have to re-do the configuration changes so we should take a snapshot of /etc for future reference:

puff@olddarksleep:~$ rsync -avz -e ssh /etc/ mig@newdarksleep:/home/mig/olddarksleep_etc/

Take The Easy Way Out and Snapshot /var and /htdocs

Okay, I had planned to be more precise and ste by about this, but the new box has more disk space, and I was running out of time at the old hosting center. So I just made a brute-force backup of /etc, /home, /var/lib/, and /var/www/htdocs to the new machine, and I will then merge them in from there:

puff@olddarksleep:~$ rsync -avz -e ssh /var/lib/ mig@newdarksleep:/home/mig/olddarksleep_var/lib/
puff@olddarksleep:~$ rsync -avz -e ssh /home/ mig@newdarksleep:/home/mig/olddarksleep_home/
puff@olddarksleep:~$ rsync -avz -e ssh /var/www/htdocs/ mig@newdarksleep:/home/mig/olddarksleep_htdocs/

Strictly speaking, I could have stuck htdocs under var on the box, but I like to be able to look in /home/mig and make sure I've snagged all the big ticket items, without having to grovel through the hierarchy.

Create A List Of Installed Packages

The --set-selections flag doesn't actually download anything, just sets the package state in the dpkg database to selected, i.e. it marks them as selected.

Note: In this example, I have the dpkg files located in /old/var/lib/dpkg. This assumes that you've mounted the old machine's drive on the new machine, under the mount point "/old". If you're actually running the command on the old machine, just use "--admindir=/var/lib/dpkg/".

puff@olddarksleep:~$ sudo dpkg --get-selections --admindir=/old/var/lib/dpkg/ > oldpackages.txt

Review the package selections and see if there are any you want to skip in the new machine, then rsync it across to the new machine:

puff@olddarksleep:~$ rsync -avz -e ssh oldpackages.txt mig@newdarksleep:/home/mig/

Then, on the new machine, pump the packages into dpkg --set-selections. The --set-selections flag doesn't actually download anything, just sets the package state in the dpkg database to selected, i.e. it marks them as selected.

mig@newdarksleep:~$ cat oldpackages.txt | sudo dpkg --set-selections

To actually fetch and install them, I have to do something like:

However, I'd prefer to do it via aptitude, and I'd prefer to have aptitude download all the package files ahead of time.

I'm told that I should be able to just feed the packages list into aptitude instead of into dpkg --set-selections, but that turns out to be incorrect. I can, however, do:

mig@newdarksleep:~$ cat oldpackages.txt | sudo dpkg --set-selections
mig@newdarksleep:~$ sudo aptitude --download-only install

And this does what I want, causing aptitude to go out and download all of the packages that I set with dpkg --set-selections.

However, this brings up a whole new can of works, which is that my package setup is broken and has all sorts of dependency conflicts. Hm...

This should show me the list of manually installed packages:

mig@newdarksleep:~$ sudo aptitude search ~i\!~M

However, in my case that just gives me a list of 638 packages, instead of 817. Probably because the existing system was built via dpkg --set-selection.

Some advice from some #debian folks:

 <Foo> puff for instance...if you set selections...then you do apt-get install toilet...nothing will happen [23:24-12/06]
 <Bar> puff:  each package has a 'desired status' in the dpkg database, /var/lib/dpkg/status
 <puff> Foo: Okay, I guess this means that --clear-selections is safe to use for me, then.
 <Foo> if you clear selections and do apt-get install moo ...nothing will happen
 <puff> Bar: Ah, cool.
 <Foo> puff first do get-selections though

I'm leaning, at this point, towards making a list of the packages I'm pretty sure I want, and installing them and configuring them, and adding anything I've missed later. So far my hand-composed package list is:

<themill> !aptitude clone
<dpkg> To clone a Debian machine using aptitude (or install your favourite 
 packages) use 
aptitude search -F '%100p' '~i!~M' > package_list; 
on the reference machine; 
xargs aptitude --schedule-only install <  package_list; aptitude install; 
on the other machine.  
This preserves information about "automatically installed" packages that other methods 
do not.  See also <reinstall>, <things to backup>. <debian clone>
<puff> !reinstall
<dpkg> methinks reinstall is 
aptitude reinstall '~i' ; 
or
COLUMNS=200 dpkg -l | awk '/^[hi]i/{print $2}' | xargs apt-get -y --reinstall install
or
dpkg --get-selections > my_packages.txt
then later, 
dpkg --set-selections < my_packages.txt && apt-get install
See also  <aptitude clone> <debian clone>
<puff> !things to backup
<dpkg> A list of some of the things you should back up on your box is: 
/etc
/home
/root
/usr/local 
/usr/src 
/opt 
/srv.
Tailor this list to your own purposes.  If you think you don't need /var, 
make sure you don't forget
/var/lib/dpkg 
/var/lib/apt* 
/var/lib/mysql 
/var/spool/mail
/var/www 
/var/cache/debconf ...

<puff> !debian clone [16:27-12/07] <dpkg> One method of cloning Debian installs is to take a current Debian machine that is set up with the packages you want and run the command "dpkg --get-selections > ~/selectionfile". Then, after the base install on other machines use that file and do: "dpkg --set-selections < ./selectionfile && apt-get dselect-upgrade". Also ask me about <aptitude clone>, <reinstall>, <things to backup>.

Customize Package Configurations

Maildir setup

This should be already handled when ryncing /home, but check, to make sure each user in /home has a /home/username/Maildir.

On my system, I have postfix configured to deliver messages into a /home/username/Maildir. This works fine, although it has two slight drawbacks. One is that I have to create a shell account for each web mail user, even if they never log in. The other is that I have to make sure the account has a Maildir created before they try to use the webmail interface, Squirrelmail, or Squirrelmail will spaz out when they try to log in. This is easy enough to do by just sending a test email to the new user, which will cause Postfix to create /home/username/Maildir.

Look For More Data

The following are all just sketched in with "to be done". It's more to make sure I remember what needs to be done, than anything else, because each topic certainly has tons of other tutorials out there.

However, I'm going to try to take notes as I go through the process, and flesh these sections out a bit.

Configure Apache Virtual Domains

In debian this is handled via /etc/apache2/sites-available and /etc/apache2/sites-enabled.

Also, compare the apache config files and see if I made any tweaks to the default configs, like enabling .htaccess (if that's not enabled by default).

To Be Done

Configure Bind Domains

My Bind server is also responsible for providing secondary DNS for a few other domains.

To Be Done

Configure Any Custom Cron Jobs

To Be Done

There will, of course, be cron jobs added by various packages. But check to make sure you copy any custom cron jobs over. I know I have a couple, but they may or may not be necessary on the new box.

Configure Any MTA Customizations

I have postfix set up to deliver to /home/username/Maildir.

To Be Done

Configure Any MTA Virtual Domains

As with apache virtual domains, an MTA virtual domain is where the MTA server is expected to handle incoming email for domains other than the primary domain.

To Be Done

Configure Your Database Installations

Most databases install with ODBC limited only to local traffic. Make sure you have that opened if you need it. Consult the docs for your particular databases.

Most databases install with no root password. Make sure you set a database root password.

Import any databases from your old machine; make sure you issue the right GRANT statements.

It's possible to copy the entire binary database files over. I'm not sure if that's reliable/clean enough.

Migrate Mailman

Found several almost-comprehensive posts and docs on this.

http://mail.python.org/pipermail/mailman-users/2007-January/055211.html

http://mail.python.org/pipermail/mailman-users/2007-January/055208.html

http://www.mail-archive.com/mailman-users@python.org/msg46099.html

http://www.mail-archive.com/mailman-developers@python.org/msg03127.html

http://wiki.list.org/display/DOC/How+do+I+move+a+list+to+a+different+server-Mailman+installation.

http://www.debian-administration.org/article/Migratingmailmanlists

For the most part, it seems quite simple (copying the contents of a few directories over), but there seem to be some passing references to things that aren't clearly explained, like fix_urls. It's also not clear whether these are important, or not.

Note: msapiro on freenode's #mailman was kind enough to explain: "fixurl is necessary step if either the web host name or email host name or URL pattern changed. So you should run 'bin/withlist -l -a -r fixurl' after you put the new lists in."

Annoyingly enough, the debian-administration.org article doesn't match the debian mailman layout, but it appears that most of mailman is still under /var/lib/mailman.

Incoming mail: There's one gotcha, which is incoming messages getting stuck in the mailman queue during the transition. Most of my lists are low enough traffic that I don't have to worry about that anyway. But just in case, I temporarily shut down postfix on my old server for the move, which helps me avoid incoming messages during the move. Mail servers trying to send messages should just retry in a little while.

This message suggests setting your mail server to accept incoming mail from localhost only, which will stop incoming mail, but allow outgoing mail to trickle out until the queue is empty:

http://www.mail-archive.com/mailman-developers@python.org/msg03127.html

Step 1, of course, is to back up the old data, which I already did when I made a backup copy of /var. Also, since I made a backup copy of /etc, I have a backup copy of both /etc/mailman/mm_cfg.py and /etc/aliases.

Step 2, is to install mailman normally, using apt. Early on, in the larger migration process, I used apt-get --get-selections and apt-get --set-selections to do a bulk install of all of the packages that the old box was moving, which includes mailman.

Step 3, make a backup copy of your new mailman installation, just in case. This is particularly important because one of the changes below is going to overwrite the new mailman install's site-wide configs. You'll need to be able to undo this if something goes wrong.

$ cp -a /var/lib/mailman /home/mig/newmailmanbackup

Step 4, now we get to the nitty gritty, copying the data around. Fortunately, since we're staying on the same distro, the files are pretty much in the same place. Also, according to this:

http://www.mail-archive.com/mailman-users@python.org/msg46099.html

It should not be necessary to install the "same" version and then upgrade. In this example, it should be OK to just install Mailman 2.1.9 directly on the new system.

Mailman is aware enough to update a newly encountered, older version config.pck (or even config.db from 2.0.x) to the current format. A lot of what bin/update does when you update to a new release is stuff that Mailman will do on the fly when you drop an 'old' list into a working Mailman, or it is generic stuff having to do with file locations, queue entry formats and other things not directly relevant to a list.

Thus, it is normally just fine to drop a 2.1.4 config.pck into a working 2.1.9 installation.

One thing to bear in mind is that the last copy in the list below will overwrite the site-wide configs. This includes the site-wide password, so make sure you have a backup, just in case.

copy the lists directory:
$ sudo cp -a /home/mig/var/lib/mailman/lists/* /var/lib/mailman/lists/*
copy the public archives:
$ sudo cp -a /home/mig/var/lib/mailman/archives/public/* /var/lib/mailman/archives/public/*
copy the private archives: 
$ sudo cp -a /home/mig/var/lib/mailman/archives/private/* /var/lib/mailman/archives/private/*
copy the site-wide configs: 
$ sudo cp -a /home/mig/var/lib/mailman/data/sitelist.cfg /var/lib/mailman/data/sitelist.cfg 

Okay, now the even nittier-grittier:

Hand-merge in the mailman aliases, if necessary, from /home/mig/etc/aliases to /etc/aliases. I've heard that mailman can now handle this with a single over-arching alias and do all the list-specific alias handling inside mailman. However, I don't know the details, and I can't find anybody on #mailman to ask about them, but I do remember that leaving the old, detailed aliases in place should be safe, so that's what we'll do.

Or you can just run genaliases, which will generate a complete set of aliases for you, and append the output to /etc/aliases:

$ sudo /var/lib/mailman/bin/genaliases >> /etc/aliases

Don't forget to run newaliases to regenerate the aliases DB:

$ sudo newaliases

Now hand-merge any necessary changes from /home/mig/etc/mailman/mmcfg.py into /etc/mailman/mmcfg.py. In my case, I did a diff to see if there were any differences:

$ sudo diff /home/mig/etc/mailman/mm_cfg.py /etc/mailman/mm_cfg.py
59c59
< DEFAULT_URL_PATTERN = 'http://%s/cgi-bin/mailman'
---
> DEFAULT_URL_PATTERN = 'http://%s/cgi-bin/mailman/'
95a96,101
> #-------------------------------------------------------------
> # Uncomment if you want to filter mail with SpamAssassin. For
> # more information please visit this website:
> # http://www.daa.com.au/~james/articles/mailman-spamassassin/
> # GLOBAL_PIPELINE.insert(1, 'SpamAssassin')
> 

Hm, that doesn't seem like a hugely important difference, so I'm going to skip it and see how it goes.

Note: msapiro later informed me that DEFAULTURLPATTERN should always end with a trailing /, and that since the mailman install my old lists were under didn't, but the new install does, I should run fix_urls to update the lists URLs, even though the hostname didn't change.

Finally, restart mailman, then send a test message through and... no (sad horn music plays here). Not only did the message not get through, the bounce didn't get through:

post.1:May 11 01:35:01 2010 (4055) post to mailmantest from puff@darksleep.com, size=2967, message-id=<20100511052832.GQ14047@darksleep.com>, 21 failures
post:May 11 16:53:14 2010 (4055) post to mailmantest from mailmantest-bounces@darksleep.com, size=1109, message-id=, 1 failures

I grepped /var/log/mail. and /var/log/mailman/ for the listname, which turned up that the message got received and handed off to mailman:

mail.log.0:May 11 01:28:39 darksleep amavis[2895]: (02895-03) Passed CLEAN,  -> ,, Message-ID: <20100511052832.GQ14047@darksleep.com>, mail_id: UR3sTaCO4K1B, Hits: -0.001, queued_as: 633781A0EBB, 6953 ms
mail.log.0:May 11 01:28:39 darksleep postfix/smtp[4431]: 9FBD81A0EBD: to=, relay=127.0.0.1[127.0.0.1]:10024, delay=7.2, delays=0.18/0.02/0.01/7, dsn=2.6.0, status=sent (250 2.6.0 Ok, id=02895-03, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as 633781A0EBB)
mail.log.0:May 11 01:28:39 darksleep postfix/local[4435]: 633781A0EBB: to=, relay=local, delay=0.53, delays=0.19/0.05/0/0.29, dsn=2.0.0, status=sent (delivered to command: /var/lib/mailman/mail/mailman post mailmantest)

However, when mailman tried to return the favor and hand the post back off to postfix, no dice:

May 11 17:56:27 2010 (4055) Low level smtp error: (110, 'Connection timed out'), msgid: 
May 11 17:56:27 2010 (4055) delivery to puff@darksleep.com failed with code -1: (110, 'Connection timed out')

The problem, it turns out, is that debian by default installs postfix in a chrooted jail. See /usr/share/doc/postfix/README.Debian:

puff@darksleep:~ $ cat /usr/share/doc/postfix/README.Debian 
There are some significant differences between the Debian Postfix packages,
and the source from upstream:

1. The Debian install is chrooted by default. [...]

Fortunately, looks like there's plenty of docs on how to get it working. What's odd is, I don't remember ever having to deal with this in the past.

http://wiki.debian.org/Postfix#MailmanwithPostfix

Check Other Places Besides /home For User Data

Right now this is just a random list of thoughts on where data might be squirreled away.

- any subversion and CVS repositories
- databases
- web stuff from /var/www/htdocs

check /opt

check /var/mail unless you use ~username/Maildir

Check for subversion and cvs repos

Also, just in case...

check /usr/local check /var/spool check /var/lib/???

check /var/lib/mysql check /var/lib/postgres

check /var/lib/amavis ?? check /var/lib/apache ??

check /var/lib/apt

check /var/lib/bugzilla?? check /var/lib/cvs?? check /var/lib/svn??

check /var/lib/defoma??

check /var/lib/dpkg? /var/lib/lire?? /var/lib/squirrelmail??

DNS and Bind

http://www.netadmintools.com/art232.html

http://www.zytrax.com/books/dns/apa/ttl.html

creating a basic zone file: http://www.tech-recipes.com/rx/305/dnsbind-create-a-basic-zone-file/

http://www.tech-recipes.com/rx/310/dnsbind-set-ttl-for-individual-resource-records/http://tldp.org/HOWTO/DNS-HOWTO-5.html

Other Notes

Single-User Mode

If you're not seeing the bootloader prompt, either hold the shift key down, or increase the delay.

To reboot debian into single-user mode, at the bootloader prompt enter:

init=/sbin/init 1;

This will boot into runlevel 1.

Or try this:

init=/bin/bash

This will just start a bash shell without starting up processes, etc.

Look in /etc/inittab for what runlevel is what.

Otherwise, boot normally, log into the console, su to root and enter:

# telinit 1

You may need to remount stuff as read-only, after that.

Making a dd image

If you have file corruption, you can use dd to make a bit-for-bit copy of the disk image, in case ytour recovery efforts further fuck it up. Read:

http://www.codecoffee.com/tipsforlinux/articles/036.html

If you have file corruption that is not caused by hardware failure (if you do, see next), you can also try to tar or rsync specific subsets of the hierarchy, to back those up, working around the corrupted stuff.

Remember to unmount the partition you're reading from:

$ sudo umount /old/partition/mountpoint
$ dd if=/dev/youroldpartition of=/some/file/ona/bigger/disk

The file that dd creates will be as big as the old partition, even if the old partition is mostly empty. If you're tight on space, you can pipe it through gzip.

$ gzip -c /dev/oldpartition > foofile.gz

The above is faster than the otherwise-obvious approach of simply doing:

dd if=/dev/part | gzip > backup.dd.gz

You can mount the dd image (unless it's zipped) so it shows up as a partition, using a "loop mount". This creates a virual block device out of a file on the filesystem. You can use it to mount ISO images or, in this case, a dd image

With reiserfs, use this command (but you should probably add an ro (read only) option):

# mount -t reiserfs -o loop fileimagename mountpoint

For example:

mount -t reiserfs -o loop /home/mcgee/oldmcgee.dd /oldmcgee

For ext2/ext3 files, there's a debian package called e2tools for extracting files from the dd image.

Don't use dd with hardware failure

When I had a dying-but-not-quite-dead disk, my hardware-savvy friend suggested the best strategy, assuming I didn't care enough to spend several thousand dollars sending it to a computer forensics specialist, was to:

0) skip DD if you think it's a hardware problem

1) extract the disk from the original hardware and put it in an external USB enclosure

2) rsync selected file hierarchy subsets off the USB-enclosed, dying disk, trying to work around the suspected corrupted areas.

3) yank the USB enclosure's power at the first sign of errors, clicking noises, etc,

Various other folks have suggested sealing the drive up in plastic to prevent moisture from getting in, then freezing it. The theory is heat expansion is somehow a factor in the problem, or perhaps that contraction caused by the chill will counteract looseness caused by wear and/or abuse.

Rsync'ing

Rsync is really easy to use for limited stuff, and quite handy for that sort of thing, but a pain in the ass if you try to do anything fancy (see below for examples of the easy stuff).

As a result, generally I try to just reorganize the files I need to move around to reduce the problem to the easy case (see below) rather than try to do anything tricky. E.g. if I want to skip copying a large subdirectory out of a file set, I'll just temporarily move it elsewhere, rather than trying to get rsync to skip it.

Also,because of that pain-in-the-ass factor, I highly recommend testing each rsync command first with --list-only, which will tell you what rsync expects it will be doing. You're usually moving large chunks of data around, so it's a good idea to check, first.

Rsync's verbose output is voluminous and useful, but that can make it hard to be certain that nothing went wrong. So I often like to repeat the rsync command a third time, just to confirm that everything is copied and there's nothing left to do.

The Easiest, Most Useful Rsync Case, locally

For a local rsync:

$ rsync --list-only -avz -/home/puff/sourcedir/ /home/puff/destdir/
... review the proposed changes ...
$ rsync -avz -/home/puff/sourcedir/ /home/puff/destdir/
NOTE THAT BOTH DIRECTORY PARAMETERS HAVE TRAILING SLASHES.

The trailing backslashes are important, they tells rsync to recurse into the sourcedir and make it match up with the destdir. If you leave the trailing slash off of either source or destination, you get annoying behavior where rsync creates destdir/sourcedir and puts everything under it.

I like to use rsync when doing a partition-to-partition copy or move, since those don't work too well if you interrupt them, whereas rsync can deal with being interrupted and reissued quite well.

The Easiest, Most Useful Rsync Case, Across the Network

Here's how to use rsync across the network, tunneling through ssh:

$ rsync --list-only -avz -e ssh /home/puff/sourcedir/ remoteuser@remotehost:/home/puff/destdir/
... review the proposed changes ...
$ rsync -avz -e ssh /home/puff/sourcedir/ remoteuser@remotehost:/home/puff/destdir/

Fixing Your Mistakes

Leaving off one of the trailing backslashes is easier to do that than you might think. And by the time you notice, you may have moved considerable amounts of data across the network. Rather than just delete it and spend another ten minutes waiting, you can rearrange stuff on the destination machine and then rsync locally.

$ mv ./destdir/sourcedir ./tempsourcedir
$ rsync -avz ./tempsourcedir/ ./destdir/
... review the output and make sure everything went well ...
$ rm -rf ./tempsourcedir

And if you interrupted the earlier rsync, now you can repeat it (with the typos corrected) and not have wasted much time.

Fancy Schamncy Stuff

Here's a decent tutorial on rsync, (but I still wasn't able to get it to do what I wanted):

http://troy.jdmz.net/rsync/index.html

Rsync is supposed to have support for parameters or config files to exclude and include stuff, but I haven't found those to be too useful. That may be because I was dealing with a partially corrupted disk; maybe rsync was looking at the corrupted sectors even though it was planning to skip them. Idkanow.


See original (unformatted) article

Feedback

Verification Image:
Subject:
Your Email Address:
Confirm Address:
Please Post:
Copyright: By checking the "Please Post" checkbox you agree to having your feedback posted on notablog if the administrator decides it is appropriate content, and grant compilation copyright rights to the administrator.
Message Content: