Backup, and Version Control

Backup and versioning are two important parts of managing ones files, with some degree of overlap of duties depending on the particular choice of backup and versioning programs. I, for example (as up to writing this, in 2008), make my own manual copies of `versions' of documents and programs, using no other versioning system. For backup, I have automatic backups with rdiff-backup, taking full home-directory backups every night, keeping nightly versions for a fortnight and weekly versions for several months; I also have a half-hourly rdiff-backup of the $HOME/work/ directory, going back some months, which can be handy if something is removed or modified a little injudiciously... Others seem happy with occasional CD backups, but want versioning systems. Others seem happy with nothing at all, until the failure of a single hard-disk (very unreliable among computer hardware) leaves them destitute!

Backup

Backup may be as simple as a copy or archive-file of a tree of files, or may take advantage of recording small differences since a previous backup, to make efficient use of storage space. For simple purposes, a plain copy (`cp -av source/ destination/') is all that's needed.

Plain copy

This could be done in a GUI file-manager such as Konqueror. Alternatively (easier, more scriptable) the command `cp' can be used. For example, to back up a directory called `name', one might use `cp -av name name_backup_2007-01-02', in which the `-a' is essential for a recursive copy that preserves filetimes, and the `-v' causes the name of every file copied to be displayed on the screen (often better to omit this, so as to see if there is a problem).

tar and other archivers

One convenient form of backup for long-term storage or for transfer via web, email, cd-rom etc., is an archive: a single file containing data and details (`metadata') of a tree of files.

The GUI program `ark' is one of several that can create and unpack many archive formats.

The command `zip' deals with a popular multiplatform format for archiving and compressing data, often called just `zip' (e.g. pkzip, winzip). See the manual for more detail: `man zip'. For example `zip -r name name' when `name' is the name of a directory, will create an archive `name.zip' containing the contents of the directory `name'.

The more usual Unix way with archives is to use the command `tar' to pack/unpack files to/from an archive file, and optionally a compression program to compress/decompress the archive file. The main advantage is full support for all the many metadata attributes and filetypes (ownership, mode, mtime atime ctime, symbolic links, device files, etc., some of which are mainly useful for the administrator in backup up the system, but others of which are of general use). For example, the following would create an archive `name.tar' from directory `name': `tar -c -f name.tar name'; the following would unpack the archive: `tar -x -f name.tar '. Adding a `z' after the `-c' or `-x' would cause gzip compression to be used, or adding a `j' instead would cause bzip2 compression to be used. Other, slower but higher compression options exist such as p7zip.

rsync: efficient update of a local or remote copy

rsync allows two directory trees, or just two files, to be made so that the second becomes like the first, but with a minimum of transferred data. This allows, for example, a backup directory to be updated in a few seconds even when containing many megabytes of data, if not much has been changed since last time. See the manual page (`man rsync', with `q' to exit), or try `rsync -av --delete work/ backup/' or `rsync -avz --delete work/ user@remotehost:~/backup/ ' for remote use through ssh. The rsync algorithm is distinctly clever. It's described in this PhD thesis, by its author, who is responsible for several other useful programs.

rdiff-backup: multiple versions, low disk-use, small transfers

rdiff-backup is a set of python modules that uses librsync to do a sync of the source to the destination (just as with rsync) but to keep all the information about what was changed; this allows many previous versions to be stored in little disk space and obtained if needed, while keeping the main mirror up to date. Again, the manual (`man rdiff-backup') or experiment. For example: `rdiff-backup dirname/ backup/ ', then change something in the source directory `dirname/' and run again, then check the versions available with `rdiff-backup -l backup/', then restore some old version with `rdiff-backup -r date-and-time backup/ restore/'.

Version Control | Revision Control

Version control systems allow the storage of different versions of a set of files, for example the parts of a program, the parts of a document, pages of a website, etc.

In its basic form such a system allows old versions to be kept every time a change is made, probably with some efficient reduction in storage space by keeping just the changes rather than storing every file every time (so, rather like the rdiff-backup mentioned above).

In more advanced forms, different users can `check-out' parts of a project, work on them alone, then `check-in' the changes, with concurrent changes to a single file often being able to be merged automatically (if not in the same part) and with records kept of who did what. Various editors, including Emacs and the Matlab Editor (recently), have support for automated checking in and out.

Subversion

Subversion (SVN) (also on wikipedia) is a widely-used, widely supported (by programming GUI editors, for example) version control system. We have programs that allow SVN `repositories' to be created and used by users, in their own filespace. There are also more exotic ways of accessing it, by SSH, by specific SVN server (we don't run this) to a GUI client, or by web. As usual, `man svn' is a good start.

CVS, RCS

CVS is another, still widely used, older versioning system, a development of RCS.

diff and patch

This is not a conventional backup or versioning system, but these are two complementary, widely used and widely applicable commands that work with changes between versions of files.

`diff' compares two files (or two trees of files, with the `-r' option) and reports the differences in a compact, human-readable format that can be interpreted by the program `patch' to make these changes automatically to do the same editing changes to another copy of the file. This is often useful for sending efficiently the small changes needed to a large project, in a format that can be checked to see that nothing bad is being included in the changes. patch can also be used locally just to remember what changes one has made between different files.


Page started: 2007-11-xx
Last change: 2008-12-01