I have just finished spending this weekend completely replacing my backup process. Seeing as a few people have asked and it’s all fresh in my head I’ll try and write down some key lessons/decisions I came across along the way… OK, OK, I want to show off a bit too :-p
I have a lot of very old boxes at home so the brief was to build a system that I could use as a web & mail server at home (I have pro hosting out on the internet for big projects), a file server (NAS) for home machines, a music server for home entertainment and a backup server to back up *all* my machines. I decided on an AMD 64 with 4 x 400GB SATA drives and an Adaptec hardware RAID controller set up as RAID 5. I chose CentOS for the OS and the now opensource cobalt GUI for mail & web (for those of you that don’t know there is a very nice combined installer). A combination of Ampache and mt-daapd work brilliantly as music servers.
But anyway, back to the backups!
I tried very hard to find an off the shelf package that was cheap/free but failed to find one that ticked all the boxes. In the end I picked rsync as my backup tool of choice and decided to roll my own, here is why…
Firstly rsync means that only data that has changed since the last backup will be transferred, this means it’s a lot quicker to backup an entire server both on the CPU and bandwidth. By shell scripting it I can make my backups do exactly what, my scripts are now over 700 lines but they do do exactly what I want and can easily force an extra backup of one or all machines at any time.
Using SSH (–rsh=ssh option) all communication between the backup server and the server being backed up can be done securely without opening any extra insecure ports on each system to be backed up.
SSH can also be used to automatically (and securely) log on to the remote system and run a system specific pre-backup script, I use this to do such things as export any MySQL databases so they can be imported back in instead of having to recover DB files.
The most powerful option is –link-dest which allows you to hard link unchanged files to a previous backup. This saves HUGE amounts of space on the backup server. e.g. I backup a machine (Fred), Fred has a 20GB HD and I back up/copy the entire drive excluding /proc and /dev. I now have a 20GB directory on my backup server. The next day I backup Fred again and –link-dest to yesterdays backup. Rsync compares the remote files with the local copy and if exactly the same will not bother transferring them but will hard link the new file to yesterdays file. Any files that have changed are copied down a fresh (or partially copied using yesterdays backup if possible). If only 100MB of files of changed since yesterday I now have two directories both with 20GB of files but only taking up 20.1Gb of space on the backup server!
My new backup scripts now do this every day and then have a script that keeps every days backups for a month, one backup a week for the previous month and one backup a month previous to that. Hopefully this means that I can recall previous work I have done and all while saving huge amounts of space.
While –link-dest is the most powerful option -n is definitely my favourite. A while ago I was in the unfortunate situation of having one of my internet hosting servers hacked. As I am sure you are aware when this happens you can trust NOTHING on the server, any binary could have been changed including ls & ps. By running a backup of the server but adding the -n option the backup did not back anything up but could show me every file that had changed since the last backup no matter how small the change. Very powerful when trying to find out what happened.
Other options I should mention are –bwlimit which is useful for backing up a live server over the internet and not maxing things out. -z which compresses the data and makes backing up even quicker. -a –numeric-ids good for backup.
Net result, I do full backups of 8 machines each night while only transferring files that have changed and have full backups going back for as long as I want.
Ooooo, I wrote a lot! Sorry about that. I hope it’s of use to people!