Backup and Restore by wanghonghx



Backup and Restore

An important part of any security policy is backup and restoration of data. If
data is lost or damaged then there should be backup available.

Occasionally computer files become lost – these losses have many causes:

    A user may delete a file accidentally
    A bug can cause corruption to data
    Hardware faults
    External events such as a fire or flood

The damage may be minor or huge and can be expensive and time
consuming to repair. For example, if a user deletes a windows file
accidentally, he/she can usually restore it from the recycle bin. Obviously it
also is necessary to have up-to-date anti virus software installed. Even
relatively inexperienced users quickly realise the benefit of backup as there
is nothing more frustrating as having to redo a task because we forgot to
save it or we used a corrupt floppy disk.

To ensure against loss an administrator may need to plan and implement a
system that periodically copies everything on the system to another location.
This is effectively an insurance policy which represent time expended to
prevent future losses. The backup should be performed at regular intervals
and the backup media is stored safely and securely.


The following factors should be considered when developing a backup

What should be backed up? Usually everything although certain files may
be regarded as more important and possibly need extra attention. Also
systems files and applications are probably already permanently backed up
at installation time and only the data files (which may change) need to be
backed up.

Who backs up the files?

Where, when and under what conditions should backups be performed?

How often should backups be performed?

How quickly does a missing or damaged file need to be restored?

How long should the data be retained?

Where should backup media be stored? Near for quick restoration or far
in case of physical damage like fire?

Budget considerations? How valuable is the software/data and what is the
cost of the actual backup media, paying somebody to do it (much can be
automated) and storage locations (e.g. valuable data should be stored in an
alternative location in case of fire).


    Backups should ideally be performed on idle systems (e.g. after
        working hours).
    How much backup media is required – one needs to consider both
        capacity and Input/Output transfer rates.
    All data should be accessible to every backup media.

Implementing backup in a UNIX / Linux environment

The amount and frequency of backup depends on the value and risk to the

UNIX provides a number of tools – compression and archiving which are
very useful when it comes to performing backup. Compression means that
less disk space (and possibly bandwidth if performed over an open network)
is required. Archiving is a convenient method of bundling files together.

Archiving and Compressing files in UNIX / Linux
tar is a multipurpose tool. The program was originally created for archiving
files to tape—the name stands for “tape archiver.” Because UNIX (and
hence Linux) treats hardware devices as files, a tape archiving program like
tar can be used to create archives as files on disk. These files can then be
compressed, copied to removable media, sent over a network, and so on.


How big is a file?
Type ls-l and the directory listing tells you how many bytes are in the file. In
the example listing below, minutes.txt is 3 bytes long and note.txt is 1201
bytes long.

$ ls -l
total 6
-rw-r--r-- 1 mjb         group      3 Feb 04 23:31 minutes.txt
-rw-r--r-- 1 mjb         group   1201 Feb 04 23:25 note.txt

But those are the file sizes, not the amount of space used on the disk. To see
the space used on disk, add the -s switch by typing ls -ls. The new listing
(shown below) includes an initial column that contains the number of blocks
used on the disk by the file. A block is a unit of 512 bytes. The first file,
minutes.txt, uses 2 blocks or 1024 bytes and note.txt uses 4 blocks - 2048
$ ls -ls
total 6
  2 -rw-r--r-- 1 mjb        group      3 Feb 04 23:31 minutes.txt
  4 -rw-r--r-- 1 mjb        group    1201 Feb 04 23:25 note.txt

Earlier UNIX systems used an allocation unit of 512 bytes. These 512 bytes
made up 1 block. As disk sizes grew, the basic allocation unit was increased
to 1024 bytes on most systems (larger on some), but many utilities, such as
ls, continue to report file sizes or disk use in 512-byte blocks. This block


size remains the standard for many utilities, even though the actual size of an
allocation unit has increased to 2 or more blocks.

Let us now look at the ls- ls listing again. A 3-byte file will occupy a 512-
byte block, but more importantly for disk usage, it will occupy 1 allocation
unit, which on the system in the example below is 2 blocks, or 1024 bytes.
The ls -ls listing correctly indicates 2 blocks used on the disk. Similarly, a
1,201 bytes file, and should therefore occupy 3 blocks (1024 = 2 complete
blocks plus an additional 177 bytes in a third block).

$ ls -ls
total 6
  2 -rw-r--r-- 1 mjb     group     3 Feb 04 23:31 minutes.txt
  4 -rw-r--r-- 1 mjb     group   1201 Feb 04 23:25 note.txt

This seems dreadfully wasteful. In fact 99.7 percent of the space allocated
for minutes.txt is unused, and 41.4 percent of the space for note.txt is wasted.
Multiply this by the number of files on the system, and you'll begin to
imagine vast black holes of disk space that cannot be reached except by
forcing all users to create and fill files that are multiples of 1024. This
wasted space is known as internal fragmentation.

Remember the high percentage of waste only occurs on very small files, so
the larger the file the more efficient the allocation system is. The allocation
system is a good compromise between disk allocation and speed of disk


Note ls-l, ls -s, and similar variations display a total line as the first record in
a directory display. The total 6 in the listing below is in fact the sum of the
blocks displayed by typing ls -ls.

$ ls -ls
total 6
  2 -rw-r--r-- 1 mjb     group       3 Feb 04 23:31 minutes.txt
  4 -rw-r--r-- 1 mjb     group     1201 Feb 04 23:25 note.txt

The utility works well on large files but performs poorly on small files. In
the sample listing below, gzip is applied to each of the files and the results
are displayed. The gzip utility correctly recognizes that it can't do any good
on minutes.txt, and leaves it alone. It does, however, compress note.txt to
188 bytes. Note that compress appends .gz to a file when it compresses it.
The effects of compress are reversed by using gzip –d file.ext. You don't
need to include the .gz in the file name.
In this case we've eliminated 2 blocks, as note.txt compressed down to 2
blocks from 4. If you follow this logic through, you begin to realize that a
small file can never be compressed below 2 blocks (or the default allocation
unit for your system).
$ gzip minutes.txt
$ gzip note.txt
ls -ls
total 4
  2 -rw-r--r-- 1 mjb     group       3 Feb 04 23:31 minutes.txt


    2 -rw-r--r-- 1 mjb       group   188 Feb 04 23:25 note.txt.gz

Compressing files with tar
If you have a directory of small files that are little used, but need to remain
on the system, one way to handle them is to combine them into one file, then
remove the originals. If the files can be strung together, all the little files can
be packed into one larger file. The obvious candidate for this combining
action is the tar utility.
Study the following listing for a moment. The tar command uses key letters
to signify actions to be performed. These are a bit like command line
switches, but are not preceded by -. In this instance, the tar arguments are:

c         create a new archive

v         verbose, provide information on what you are doing

f         the next argument is the name of the archive to create txt.tar

txt.tar the archive that is being created

*.txt     the list of files to include in the archive

Immediately after you type the tar command, tar informs you that it has
appended minutes.txt, which would take 1 tape block (a minutes.txt 1 tape
block), and appended note.txt, which would take 3 tape blocks (a note.txt 3
tape blocks). So tar reports it results in 512-byte blocks rather than 1024-
byte double blocks.
However, there is a bit of a shock in the ls -ls command issued after the tar
is complete. The new archive txt.tar is 8 blocks long. That's longer than the


original 6 blocks used by the two files. The tar utility strings blocks end to
end. It also has to add directory information into txt.tar, so it's not unusual
(in fact it's common) for a tar archive to be larger than the sum of its parts.

$ ls -ls
total 6
  2 -rw-r--r-- 1 mjb       group    3 Feb 04 23:31 minutes.txt
  4 -rw-r--r-- 1 mjb       group   1201 Feb 04 23:25 note.txt
$ tar cvf txt.tar *.txt
a minutes.txt 1 tape block
a note.txt 3 tape blocks
$ ls -ls
total 14
  2 -rw-r--r-- 1 mjb       group    3 Feb 04 23:31 minutes.txt
  4 -rw-r--r-- 1 mjb       group   1201 Feb 04 23:25 note.txt
  8 -rw-r--r-- 1 mjb       group   4096 Feb 05 01:40 txt.tar

Fortunately, tar fills those empty spaces in the blocks usually with hex
zeroes or NULLs. This makes a tar archive an excellent candidate for
Proceeding to the next logical step, the following listing compresses the tar
archive. The resulting file txt.tar.gz is 404 bytes long (1 allocation unit or 2
blocks). Then, by removing the original text files, the directory contents are
reduced to only 2 blocks, saving 66 percent of the space previously used.


$ ls -ls
total 14
  2 -rw-r--r-- 1 mjb     group    3 Feb 04 23:31 minutes.txt
  4 -rw-r--r-- 1 mjb     group   1201 Feb 04 23:25 note.txt
  8 -rw-r--r-- 1 mjb     group   4096 Feb 05 01:40 txt.tar
$ gzip txt.tar
$ ls -ls
total 8
  2 -rw-r--r-- 1 mjb     group    3 Feb 04 23:31 minutes.txt
  4 -rw-r--r-- 1 mjb     group   1201 Feb 04 23:25 note.txt
  2 -rw-r--r-- 1 mjb     group   404 Feb 05 01:40 txt.tar.gz
$ rm *.txt
$ ls -ls
total 2
  2 -rw-r--r-- 1 mjb     group   404 Feb 05 01:40 txt.tar.gz
The following listings show you how to reverse the tar-and-compress
process. The tar key argument for extracting from an archive is x. The other
key arguments are the same as in the earlier tar command.
$ ls -ls
total 2
  2 -rw-r--r-- 1 mjb     group   404 Feb 05 01:40 txt.tar.gz
$ gzip -d txt.tar
$ ls -ls
total 8
  8 -rw-r--r-- 1 mjb     group   4096 Feb 05 01:40 txt.tar
tar xvf txt.tar


x minutes.txt, 3 bytes, 1 tape block
x note.txt, 1201 bytes, 3 tape blocks
$ ls -ls
total 14
  2 -rw-r--r-- 1 mjb     group     3 Feb 04 23:31 minutes.txt
  4 -rw-r--r-- 1 mjb     group   1201 Feb 04 23:25 note.txt
  8 -rw-r--r-- 1 mjb     group   4096 Feb 05 01:40 txt.tar
$ rm txt.tar
$ ls -ls
total 6
  2 -rw-r--r-- 1 mjb     group     3 Feb 04 23:31 minutes.txt
  4 -rw-r--r-- 1 mjb     group   1201 Feb 04 23:25 note.txt

So you can use tar and gzip to save lots of disk space on files that are rarely

Once you've located stashes of files that are rarely used, but which must
remain available, you should archive and compress them.


Backup Exercise

Task 1 Backup
Write a backup script which can be used by any user to backup the contents
of their home directories. For the purpose of this exercise, assume the


backup is stored on a remote drive at /media/backup. In addition, remove
any unwanted scripts from the home drive after the backup process has been

The first decision is to decide3 where to locate this script – it must be placed
in a directory to which all users have access.

The following script is a relatively simple and effective solution for this

                 cd $HOME
                 tar cvf backup.tar *
                 gzip backup.tar
                 cp backup.tar.gz /media/$LOGNAME
                 rm backup.tar.gz

cd $HOME This changes to the current user’s home director. HOME is an
environmental variable.

tar cvf backup.tar * This archives everything in the cirrent directory - * is
a wildcard similar to its use in Windows.

gzip backup.tar This compresses the archive file.

cp backup.tar.gz /media/$LOGNAME This copies the compressed archive
to its designated location. It is the environmental variable LOGNAME to
ensure that each user’s backup has a unique name.


rm backup.tar.gz This removes the original on the user’s home drive.

Remember grant all users execute permission e.g. chmod 755 backup

Task 2 Security
Enhance the security of your backup by making sure only its owner can
retrieve it or is aware of its existence.

The script above is adequate for creating a backup but it does not include
any protection. Home drives in UNIX are owned by their users – hence users
can control access to any data. By making the archived backup only
accessed by the user prevents others reading it e.g. chmod 500 backup.tar.
Another layer of protection might be to have a dedicated directory of every
user on the backup device e.g. the user smithj would have a directory called
smithj which would be owned by the user. This would prevent others even
knowing if a backup exists on that device.

4th line would change:
     cp backup.tar.gz /media/$LOGNAME/$LOGNAME
The first $LOGNAME refers to the directory – the second becomes the
name of the backup.

Task 3 Retrieval
Write a script to retrieve the backup.


       cd $HOME
       cp /media/$LOGNAME/$LOGNAME backup.tar.gz
       gzip –d backup.tar
       tar xvf backup.tar

cd $HOME – Assume it’s to be placed in the user’s home.

cp /media/$LOGNAME/$LOGNAME                backup.tar.gz   Returns the
archived compressed file.

gzip –d backup.tar Decompressesthe archive.

tar xvf backup.tar De-archives the file.


To top