Search:  
Gentoo Wiki

Backup

Contents

Subpages

Backup Types

There are three basic types of backups: full, differential and incremental. A complete/whole/full backup makes a copy of all the files of the set to be backed up (such as your home directory, or your web site, or a complete filesystem). “Incremental” and “Differential” backups are used to minimize the amount of time for a backup, repetitive backups, and space on a disk.

What is an incremental Backup?

An incremental backup is a partial backup in that only the files which have changed since the last backup (full, or incremental) will be copied. Consequently, a series of incremental backups needs to be preceded by a full backup. All backups are kept for a possible need to restore the data (restore: the reverse of a backup). Restoring to a certain point in time requires locating the last valid full backup and all the incremental ones that followed, up to the particular point in time where the system is supposed to be restored. This model offers a high level of security that something can be restored and can be used with removable media such as tapes and optical disks. The inconvenience is dealing with a long series of incrementals and the high storage requirements.

What is a differential Backup?

A differential backup is also a partial backup, but differs from an incremental one in that the files to be copied are the ones which have changed since the last full backup; this means that the list of files increases with time, and each differential backup is larger than the previous. Its advantage is that a restore involves recovering only the last full backup and then overlaying it with the last differential backup. The disadvantage is that each day elapsed since the last full backup, more data needs to be backed up, especially if a majority of the data has been changed.

For most home users, not professional, making differential backups is the best alternative. Starting with a full/complete backup and followed by a full/complete weekly, monthly or whatever is required. The best time for a complete backup is reached when the differential backup becomes too large. The disadvantage is the file for differential backup will be always overwritten. The advantage is that it saves time and space.

Important Considerations

When you're making your backup, you don't want the filesystem changing while you're copying it (this can lead to problems). You should only run a backup on a partition that is either unmounted or mounted read-only. This means you may have to either:

A short example:

# Go into single user-mode
init 1
# Remount /home read-only
mount -o remount,ro /home

Large backup solutions

Backup Server software designed to back up multiple computers over a network.

Lightweight solutions

Software basically designed for single computers.

Local backups

Software can be used without a network.

Keep

Keep is a graphical frontend to rdiff-backup (see below) designed for KDE.

Flexbackup

Flexbackup is a perl tool to backup important directories. Like the name implies, flexbackup is very flexible.

# emerge -av flexbackup

This is what the important part of my /etc/flexbackup.conf file looks like:

File: /etc/flexbackup.conf
$set{'research'} = "/home/david/research /var/cvsroot/python /var/cvsroot/latex";
$set{'mail'} = "/home/david/.thunderbird";
$set{'etc'} = "/etc /home/david /var/www/davidgrant.ca/htdocs";

$prune{'/home/david'} = ".jpi_cache konserve-backup .cxoffice .wine .mozilla .kde3.1 .thunderbird";
$prune{'/home/david/.thunderbird'} = "Junk News";
 
$compress = 'gzip'; # one of false/gzip/bzip2/lzop/zip/compress/hardware
$compr_level = '6';

$device = '/mnt/sata/backup';

The "research", "mail", and "etc" names are just convenient names which you use at the command line when telling flexbackup what to backup. Each directory within "research" will get backed up to its own tarball, however, as "home-david-research.0.tar.gz", "var-cvsroot-python.0.tar.gz", and "var-cvsroot-latex.0.tar.gz".

$prune is a useful feature which allows you to mask out certain directories which you don't want to be included in the tarballs.

I set $compress to gzip but you can use bzip2 if you want smaller tarballs. bzip2 is a bit slower to pack and unpack, however. $compr_level=6 also makes things a bit quicker, yet still makes the tarballs quite compact.

I use the following crontab to create my backups:

0 3 1-7 * * flexbackup -set all -full -w 7
0 3 * * 6 flexbackup -set all -differential
0 3 * * 1-5 flexbackup -set all -incremental

This will perform a full backup on the first Sunday of every month, a differential backup every Saturday, and an incremental backup every day of the week. All backups are performed at 3am.

Note: Please note that if you have noexec on your /etc/fstab for /tmp, you must remove it or change flexbackup.conf to point to another directory.


Note: Please note that when backing up to disk, flexbackup does not delete older backup files. In order to do this, you can add something like the following to your flexbackup cron script, which might run weekly:
if flexbackup -set $MYSET -level full ; then 
   for i in `/usr/bin/find /path/to/backups -name \*.gz -mtime +0 -exec basename '{}' \;`
       do
       flexbackup -rmfile $i;
   done
fi 
In the preceding example, all backup files in the /path/to/backups directory (as indicated by \*.gz, assuming you tell flexbackup to gzip the backup archive), older than 1 day (specified by the -mtime +0 flag), are deleted, but only if the full backup of the set is successful.

Using ZFS and rsync

You can create ZFS partition on your hard disk, a ZFS loop device, or put ZFS on a removable device (e.g. SD card / USB device). Then you can rsync your data to the ZFS partition and take a snapshot of it periodically. Read a complete description on HOWTO Backup using ZFS.

Partitions

dd

to copy the master boot record, then the trick is to make it a file:

dd if=/dev/hda of=/root/hda.mbr bs=512 count=1

That is, of course, assuming that /dev/hda is the boot drive (replace with whatever drive you are booting). Now that it is a file, rsync will copy it as well. You can put it back with:

dd if=/root/hda.mbr of=/dev/hda bs=512 count=1

This article is still a Stub. You can help Gentoo-Wiki by expanding it.

Remote backups

Filesystem

rdiff-backup

rdiff-backup is a simple to use, yet powerful, backup utility. It can be used to build a mirror of a local source directory on a remote machine. The program automatically keeps an archive of differences with respect to previous mirrored versions so that old files or old version of still existent files can be restored. All the underlying network traffic is handled by ssh. See the rdiff-backup Main page for a detailed description.

rdiff-backup is present on the Gentoo Portage tree, to install it simply do:

# emerge rdiff-backup

In order to be able to perform a backup of local files on a remote machine, rdiff-backup must be installed both locally and on the remote machine, and must be the same (major/minor?) version (e.g., 0.13.x is incompatible with 1.0.x). In principle, no root privilege is necessary since the program is run with user privilege both locally and remotely.

Note: You needn't have rdiff-backup on the remote server if your client can make use of sshfs (or any other network file system). See this for details.

Let us assume that you properly installed rdiff-backup on the remote machine remotehost.remotedomain. Then the use of this utility can be as simple as

rdiff-backup ~/mydir remoteuser@remotehost.remotedomain::mydir-backup

for this to work, diff-backup must be installed both locally and on remotehost.remotedomain.

In this way, a new subdirectory named mydir-backup is created in the user's home directory remoteuser at remotehost.remotedomain. If this directory was already present on the remote machine, it is updated to reflect the present content of mydir, but the differences with respect to the previous version are also stored. Examples on how to manage the remote mirror and, in particular, how to recover past version of the whole directory or part of it can be found here.

Depending on the nature of the ssh user authentication scheme in use on the remote machine, the previous command can stop and ask the user for a password or a passphrase. The interactive behavior of the program can of course be avoided if an authentication agent (ssh-agent) is properly configured on the local machine. However, if one wants to implement the backup activity in an automatic, unattended way, for instance as a cron job, it is quite possible that such an agent is not available to the program when it is launched.

Let us see how this problem can be easily solved.

Unattended backup with rdiff-backup

First of all you need a passphrase-less key pair to be used specifically with rdiff-backup. You can create it with:

ssh-keygen -t dsa

When asked, choose .ssh/backup_dsa as the file where to save the new key and press enter when asked for a passphrase.

Now copy the newly generated public key to the remote host:

scp .ssh/backup_dsa.pub remoteuser@remotehost.remotedomain:

Then log in remotely and add the new key to the list of authorized keys.

ssh remoteuser@remotehost
cat backup_dsa.pub >> .ssh/authorized_keys
rm backup_dsa.pub

The last line is not necessary, but it is always safer to remove the unnecessary files. Now you have to tell the remote machine that the new key can only be used for specific tasks. For this purpose, edit the file .ssh/authorized_keys with you preferred editor and add a command specification in front of the newly added key (it will be the last line) to obtain something like:

File: .ssh/authorized_keys on remote machine
command="/usr/bin/rdiff-backup --server",no-pty,no-port-forwarding ssh-dsa <...public key characters follow...>

Now it's time to leave the remote host and configure the local machine. You have to be sure than when rdiff-backup is started, it makes use of the just generated, passphrase-less, key pair. You can do that using the powerful ssh aliasing feature. Edit .ssh/config and add:

File: .ssh/config on local machine
Host remote-backup
   Hostname remotehost.remodomain
   IdentityFile ~/.ssh/backup_dsa
   IdentitiesOnly yes

In this way, when you use remote-backup as the host identifier in a ssh or scp command, the host actually contacted will be remotehost.remodomain but the key stored in /.ssh/backup_dsa will be used, instead of the default one (whatever, if any, it would be).

The previous configuration allows for a straightforward use of rdiff-backup as a cron job. If you want an automatic backup, one minute after 1 a.m., of ~/mydir add the following line to your cron table:

File: cron table on local machine
01 1 * * *      rdiff-backup /home/localuser/mydir remoteuser@remote-backup::mydir-backup

Notice that the alias remote-backup is used instead of the true host name remotehost.remodomain.

Note: If rdiff-backup command fails with "Host key verification failed", then try adding the local backup user to the tty group.

rsnapshot

# emerge rsnapshot

then copy the example config file

 # cp /etc/rsnapshot.conf.default /etc/rsnapshot.conf

Config File Syntax

The config file is fairly straightforward but here are some basic rules/common problems that people run into.


setting times in the config file

The interval in "rsnapshot.conf" does not call any cronjob, it only tells how many backups with that title are done.

You have do declare the cronjobs by yourself

"interval monthly 3" means that you'll keep three monthly snapshots. "interval hourly 6" means that six hourly snapshots will be retained.

rsnapshot.conf has nothing to do with the frequency of the snapshots; that's determined by cron.

"0 */6 * * * /usr/local/bin/rsnapshot hourly" means that hourly snapshots will be performed every six hours, at 0000, 0600, 1200 and 1800.

backup paths

Rsnapshot's default rsync_long_args setting (--delete --numeric-ids --relative --delete-excluded) includes rsync's --relative option, which causes the extra path information to be used in the target.

Set your own rsync_long_args that doesn't include --relative in the configuration file,

e.g.:rsync_long_args --delete --numeric-ids --delete-excluded

excluding files

To exclude eg all directories starting with a dot do

exclude =.*

not a regexp (^\.) sort of stuff. More details in the rsync man page under EXCLUDE pattern

spaces in file names

if you are backing up a windows directory there will be directories and file names with spaces. Spaces in filenames can cause problems. the current best solution is to replace the spaces with a ?

exclude /mnt/winXP/John/My?Documents/folders/holiday?pictures/

trying to backup TO a vfat windows partition generates a lot of 'cant chown' errors.

ssh backups

ssh_args        -p 22  -i /root/.ssh/backupserver_dsa

or add the relevant information to /etc/ssh/config_ssh

backup	root@example.com:/home/	example.com/	ssh_args=-i example.key

or add to ssh_args on a per-backup basis, like this:

backup	root@example.com:/home/	example.com/	+ssh_args=-i example.key

You may want to put this into crontab.

00 00 * * *     /usr/bin/rsnapshot daily

will run it at midnight

cron will automatically email you anything that rsnapshot prints to STDOUT or STDERR. So mail goes to root@server and you might not receive it. You can change where cron sends mail by setting the MAILTO variable in the crontab file like so:

MAILTO=myaddress@example.com  00 00 * * *     /usr/bin/rsnapshot daily
[other cron jobs go here]

tips and tricks

You can use the --links parameter to the 'find' utility, like so: find daily.0 -type f -links 1

use different cronjobs where rsnapshot is called with different config files files (option -c).

a shell script with rsync

testit=`rsync root@1.2.3.4:/my/path/ 2>&1 | grep 'failed:' | wc -l`
if [ $testit = 1 ]; then
	exit
fi

or as part of the cronjob and it will email you when it fails

ping laptop && rsnapshot -c /etc/rsnapshot-laptop.conf daily"

Bacula Network Backup System

Bacula is a set of computer programs that permit you (or the system administrator) to manage backup, recovery, and verification of computer data across a network of computers of different kinds.

In technical terms, it is a network based backup program.

Bacula is relatively easy to use and efficient, while offering many advanced storage management features that make it easy to find and recover lost or damaged files.

# emerge -av bacula


Incremental Backups Using Rsync

There is an excellent tutorial regarding incremental backups available here.

Using incremental backups, you can create multiple snapshots of your data while conserving disk space and saving processing time. It's a very fast and cheap way of creating snapshot-style backups of your data.

The script is fairly easy to use, even for a non-expert. I just had to make a few simple changes to the backup script on the web site to get it working on my system.

Programs That Use Rsync To Create Incremental Backups

If you need more functionality than Mike Rubel's script can give you, or if you're uncomfortable editing a bash script, then you may be interested in one of these programs:

A manual example

You can:

Duplicate a partition (good for a once off):

# rsync --progress --stats -avxz --exclude "/mnt/usbharddrivemain/" --exclude "/mnt/usbharddriveboot/" --exclude "/usr/portage/" --exclude "/proc/"
--exclude "/root/.ccache/" --exclude "/var/log/" --exclude "/sys" --exclude "/dev" --exclude "tmp/" /* /mnt/usbharddrivemain

Duplicate a partition and delete outdated files (which have since been deleted on the primary partition):

# rsync --progress --stats --delete -avxzl --exclude "/mnt/usbharddrivemain/" --exclude "/mnt/usbharddriveboot/" --exclude "/usr/portage/"
--exclude "/proc/" --exclude "/root/.ccache/" --exclude "/var/log/" --exclude "/sys" --exclude "/dev" --exclude "tmp/" /* /mnt/usbharddrivemain

For my boot partition I use:

# rsync --progress --stats -avxzl /boot /mnt/usbharddriveboot

while for another partition I use:

# rsync --progress -avxzl --stats --delete /boot /mnt/usbharddriveboot

To restore, one can either boot off the secondary disk OR one can use a Gentoo Live CD. Repeat the above commands (with respect to new mount locations), but alter the source and destination parameters, e.g. /mnt/usbharddrivemain /mnt/driveToRestoreTo not the other way around.

ssh + tar scripts

In this method the contents of the backup travel (compressed) across the network, but through ssh. You can choose to backup any part (or all) of a filesystem. This is also useful if you have run out of space on a machine that needs backing up - this method doesn't require an intermediate tar file to be stored on the harddrive.

Assume two machines - server.homelinux.com and desktop.homelinux.com. Let's say server.homelinux.com has run out of disk space, you need to back it up (or even just get some files) to desktop.homelinux.com. Easy!

Connect to server.homelinux.com (either ssh in, or use a terminal):

$ ssh server.homelinux.com

Now, from the server, tar and copy the desired directory:

$ tar -zcf - /var/backup | ssh desktop.homelinux.com "( cat > backup.gz )"

This will copy the contents of /var/backup on server.homelinux.com to ~/backup.gz on desktop.homelinux.com.

Take note of file size limits! You may run into problems at 2 gigs on some old kernels, or *nix variants. tar used to have a limit of 8 gigs or so, assuming the underlying kernel/filesystem would allow it. fat32, for instance, will only allow files up to 4Gb.


Tip: Check your destination tar file (on desktop.homelinux.com) with tar -tzf backup.gz before you delete the source.

For an extremely lean backup system (~30 lines of bash scripting... but even more compressible had I not wanted to make it user-friendly) that relies on the most basic tools (Bash, tar, xargs, etc.), take a look at this script.

You maintain a collection of backup lists that are effectively arguments to tar and the script will keep n many backups of that data on some hdd. From here, you can do whatever you want with that tarball, like scp it over to other boxes... Redundancy is nice in the backup business. People in the thread have offered extensions like iso images, splitting, etc.

This was designed to be a starting component in your larger backup system. I haven't looked into incremental backups with this, but it is possible using tar with some time and bash programming skills.

These example are taken from this thread on the gentoo-user mailing list, which provides a good read!


Partitions

netcat + dd

A very simple but useful way to do a remote backup is using dd and netcat.

Note: Be aware that this is a rather liberal use of the word "remote". This method is neither efficient nor secure and that it is only suitable if both machines are connected to the same internal, private network.

Netcat is available in two flavors:

# emerge gnu-netcat

or

# emerge netcat

For a complete image of your /dev/hda1 partition start netcat in listening mode on the remote machine:

  1. nc -l -p 10000 > image.gz

On your machine run dd to read the partition, gzip to compress the content and netcat to transfer it over to the other machine:

  1. dd if=/dev/hda1 | gzip | nc -w 5 remote_ip 10000

See How to clone a Linux box using netcat for additional information.
Although, this document mentions both netcat and gnu-netcat, you need to make sure you use either netcat or gnu-netcat on both the machines. netcat on one and gnu-netcat on the other will hang. I had this problem with linux gnu-netcat and cygwin netcat. I resolved it using netcat on linux side as well.

one tip to reduce the size of 'dd' image is to mount /dev/hda1 read-write and zero out the unused blocks on it. This way the gzip compression works better and I could reduce a 18GB full KDE amd64 install (with 32 chroot) into a 2.7 GB image.

One way to accomplish that is:

mount /dev/hda1 /mnt/gentoo
mblocks=`df -B 1M /mnt/gentoo | grep gentoo | awk '{print $4}'`
# leave 256MB alone
mblocks=$((mblocks-256))
cd /mnt/gentoo
# create highly compressible file which spans the most of the free space in partition
time dd if=/dev/zero bs=1048576 count=$mblocks of=file-with-zeroes
# delete it
/bin/rm -f file-with-zeroes
umount /mnt/gentoo

and proceed with the above netcat method. Now dd image is no longer as big as the partition. Beauty of this method (netcat/dd) is that it can all be nicely automated.

-devsk

Backup Wrapper Scripts

Sometimes you want to backup things that aren't 'ready' to be backed up. Things like /boot on some systems where it isn't auto-mounted, or maybe a running mysql daemon. It would be nice to be able to mount /boot, or put mysql into read-only mode for the duration of the backup. That is where wrapper scripts come into play. The script used flexbackup as an example, modify as needed.

File: /etc/scripts/flexwrapper.sh on local machine
#!/bin/sh

# mount /boot for backup
mount /boot

# MySQL Read Only
# NOTICE: the following method is an EXAMPLE!
# you might not want your passwd in plain-text in a wrapper script.
mysql -u root --password='yourpass' -e "SET GLOBAL read_only=1;"

# Run flexbackup, and pass shell args
flexbackup $*

# give the system 10 seconds to clean up
sleep 10

# MySQL Read/Write
mysql -u root --password='yourpass' -e "SET GLOBAL read_only=0;"

# umount /boot
umount /boot

The basic idea is to do preparation works in your wrapper, run the actual backup, and then set your system back to operationg parameters.

The above mentioned crontab for flexbackup would then become:

0 3 1-7 * * /etc/scripts/flexwrapper.sh -set all -full -w 7
0 3 * * 6 /etc/scripts/flexwrapper.sh -set all -differential
0 3 * * 1-5 /etc/scripts/flexwrapper.sh -set all -incremental


Using Tapes as a Backup Media

Using tapes is still the de-facto standard in most corporate environments. Don't forget to check the status of the tapes you use (e.g. most DAT tapes can be reused up to 99 times) and DO clean your drives regularly with the appropriate cleaning cartriges (just insert the cleaning cartrige and wait till it is ejected).

The minimum set of software that is needed:

# emerge -auv app-arch/{mt-st,tar}

Put appropriate tape in the drive and check its status:

# mt -f /dev/st0 status

Save some files (e.g. the /home directory):

# tar -cvp -f /dev/st0 /home

Check if everything is OK:

# tar -tv -f /dev/st0

Eject the tape:

# mt -f /dev/st0 eject

To later restore from the tape, load it, check its status and then:

List the contents:

# tar -tv -f /dev/st0

Restore (NB: First change to the appropriate directory)

# tar -xvp -f /dev/st0

Using tapes is not the fastest process, but it helps organize your backups and you can still sip your coffee while the tape is running :-)

NB: Before a full backup or restore remember to mount partitions like /boot that are not automounted! Create the list of not backuped files:

# cd /root
# echo -n "" > tar.exclude
# echo "/dev/*" >> tar.exclude
# echo "/proc/*" >> tar.exclude
# echo "/mnt/*/*" >> tar.exclude
# echo "/sys/*" >> tar.exclude
# echo "/var/tmp/ccache/*" >> tar.exclude


To make a full backup use:

# tar -cvp -f /dev/st0 --wildcards --exclude-from /root/tar.exclude /

To restore later:

# tar -xvp -f /dev/st0 -C /

TIP: To use the tape drive on HP DL380 servers see /usr/src/linux/Documentation/cciss.txt

# echo "engage scsi" > /proc/driver/cciss/cciss0

Links

Retrieved from "http://www.gentoo-wiki.info/Backup"

Last modified: Mon, 08 Sep 2008 05:11:00 +0000 Hits: 110,015