How do du and tar handle links?

Introduction

The most convenient way to archive Unix file systems is to use the tar utility as it is designed to handle special files system entries such as symbolic links, sockets, pipes etc.

However, there tends to be uncertainty on how tar is to be used to either preserve or to resolve links in the archive. This is partly due to the fact that the concept of hard and soft links is not familiar enough and partly due to differences between the command line options needed for different versions of Unix.

This short summary aims at illustrating the use of du and tar on Solaris and Linux because the du command is often used first to find out about the size of the media needed to store the tar archive.

Imagine the following example directory for the subsequent discussion.

$ ls -liFR directory
directory:
total 88
    586399 drwxr-xr-x   2 rhinz    sanusers      80 Dec 14 18:26 dira/
    586589 drwxr-xr-x   2 rhinz    sanusers      96 Dec 14 18:16 dirb/
    586591 drwxr-xr-x   2 rhinz    sanusers     128 Dec 14 18:18 dirc/
   1528385 -rw-r--r--   1 rhinz    sanusers   43438 Nov 17 11:16 file2

directory/dira:
total 0
    586483 lrwxrwxrwx   1 rhinz    sanusers       6 Dec 14 18:26 dangling_link -> ../nil

directory/dirb:
total 120
    585197 -rw-r--r--   1 rhinz    sanusers   36028 Oct  5 09:46 file0
   3479578 -rw-r--r--   2 rhinz    sanusers   22575 Nov  6 18:09 file1

directory/dirc:
total 48
   3479578 -rw-r--r--   2 rhinz    sanusers   22575 Nov  6 18:09 hard_link_to_file1
    586597 lrwxrwxrwx   1 rhinz    sanusers      13 Dec 14 18:17 symbolic_link_to_file0 -> ../dirb/file0

It contains three subdirectories dira, dirb and dirc and a plain file file2 of 42kB size.

The subdirectory dira contains a dangling link i.e. a symbolic link to a file which does not exist.

The subdirectory dirb contains two plain files file0 (35 kB) and file1 (22kB).

The subdirectory dirc contains a hard link to file1 hard_link_to_file1 and a symbolic link to file0 symbolic_link_to_file0. Note that both file1 in dirb and hard_link_to_file1 in dirc are shown in the listing with i-node number 3479578 in the first column and with 2 as the number of links.

The disk usage command du

Preserving the symbolic links

A simple call of du with the -k option to report the size in units of 1024 bytes (i.e. kB) produces the following listing.

$ du -k directory
0       directory/dira
60      directory/dirb
0       directory/dirc
104     directory

This means that the entire directory contains 104 kB data. The subdirectory dirc is reported as 0 kB since the file with the i-node number 3479578 is already counted with the subdirectory dirb where the file was first created.

Calling du only on directory/dirc generates a different result.

$ du -k directory/dirc
24      directory/dirc

Now the file hard_link_to_file1 is counted separately.

Resolving the symbolic links

The -L option tells du to process symbolic links by using the file or directory which the symbolic link references, rather than the link itself.

$ du -kL directory
0       directory/dira
60      directory/dirb
0       directory/dirc
104     directory

On Solaris, this is exactly the same output as above with the simple du -k call as the symbolic link symbolic_link_to_file0 in dirc points to file0 in dirb which is already counted. This means that in this case the behaviour of du -L with symbolic links is the same as of du with hard links.

Calling du -L only on directory/dirc produces

$ du -kL directory/dirc
60      directory/dirc

Now both the file file0 and file1 are counted.

To make matters complicated, the behaviour of du -L on Linux differs from the above.

$ du -kL directory
0       directory/dira
60      directory/dirb
36      directory/dirc
140     directory

On Linux the symbolic link symbolic_link_to_file0 is resolved though it points to a file which is already reported in the directory/dirb listing. Consequently, directory is now reported as 140 kB in total instead of the 104 kB in total above on Solaris.

Note that the dangling link in dira has not caused any error message though it obviously could not be resolved as it points to a non-existing file.
On Solaris, the option -r causes the du command to generate messages about directories that cannot be read, files that cannot be opened, and so forth, rather than being silent. If an error is encountered, du returns with a non-zero exit code.

$ du -kLr directory
du: directory/dira/dangling_link: No such file or directory
0       directory/dira
60      directory/dirb
0       directory/dirc
104     directory

$ echo "Exit $?"
Exit 1

The archive command tar

Preserving the symbolic links

By default, the tar utility archives the symbolic links such that they can be restored when the archive is unpacked and saves hard-linked files only once within the archive.

$ tar cvf archive.tar directory
a directory/ 0K
a directory/dira/ 0K
a directory/dira/dangling_link symbolic link to ../nil
a directory/dirb/ 0K
a directory/dirb/file0 36K
a directory/dirb/file1 23K
a directory/dirc/ 0K
a directory/dirc/symbolic_link_to_file0 symbolic link to ../dirb/file0
a directory/dirc/hard_link_to_file1 link to directory/dirb/file1
a directory/file2 43K

$ ls -lh archive.tar
-rw-r--r--   1 rhinz    sanusers    107K Dec 15 12:11 archive.tar

From the size of the file archive.tar 107kB it can be seen that all three files are stored only once: file0 36kB, file1 23kB and file2 43kB.
The nice comments right to the file or directory name being archived ('xyzK', 'symbolic link to' or 'link to') are provided with tar's verbose flag v on Solaris only.

Using tar to only archive the subdirectory dirc which contains a hard link and a soft link:

$ tar cvf dirc.tar directory/dirc
a directory/dirc/ 0K
a directory/dirc/symbolic_link_to_file0 symbolic link to ../dirb/file0
a directory/dirc/hard_link_to_file1 23K

$ ls -lh dirc.tar
-rw-r--r--   1 rhinz    sanusers     25K Dec 15 12:50 dirc.tar

On Solaris, the function modifier l makes tar to output an error message if it is unable to resolve all links to the files being archived.

$ tar cvlf dirc.tar directory/dirc
a directory/dirc/ 0K
a directory/dirc/symbolic_link_to_file0 symbolic link to ../dirb/file0
a directory/dirc/hard_link_to_file1 23K
tar: missing links to directory/dirc/hard_link_to_file1

$ echo "Exit $?"
Exit 1

If this happens a non-zero exit code is returned and an error message printed.

Resolving the symbolic links

Specifying the the function modifier h requests tar to follow symbolic links as if they were normal files or directories.

$ tar cvhf archive.tar directory
a directory/ 0K
a directory/dira/ 0K
a directory/dirb/ 0K
a directory/dirb/file0 36K
a directory/dirb/file1 23K
a directory/dirc/ 0K
a directory/dirc/symbolic_link_to_file0 36K
a directory/dirc/hard_link_to_file1 link to directory/dirb/file1
a directory/file2 43K

$ ls -lh archive.tar
-rw-r--r--   1 rhinz    sanusers    142K Dec 15 13:05 archive.tar

As a result, the size of the file archive.tar increased from its original 107kB to 142kB because the file file0 (36kB) is stored in it twice.

Note that on Linux tar issues an error message if it is unable to resolve the symbolic links.

$ tar cvhf archive.tar directory
directory/
directory/dira/
tar: directory/dira/dangling_link: Cannot stat: No such file or directory
directory/dirb/
directory/dirb/file0
directory/dirb/file1
directory/dirc/
directory/dirc/symbolic_link_to_file0
directory/dirc/hard_link_to_file1
directory/file2
tar: Error exit delayed from previous errors

$ echo "Exit $?"
Exit 2

As seen before with the hard links, on Solaris the function modifier l needs to be specified for tar to output error messages if it is unable to resolve all links to the files being archived.

$ tar cvhlf archive.tar directory
a directory/ 0K
a directory/dira/ 0K
tar: directory/dira/dangling_link: No such file or directory
a directory/dirb/ 0K
a directory/dirb/file0 36K
a directory/dirb/file1 23K
a directory/dirc/ 0K
a directory/dirc/symbolic_link_to_file0 36K
a directory/dirc/hard_link_to_file1 link to directory/dirb/file1
a directory/file2 43K

$ echo "Exit $?"
Exit 1

References

The examples were created on a machine running Solaris 10 with the /usr/bin/du and /usr/bin/tar commands.
Where a reference to Linux was made, du (GNU coreutils) 5.93 and tar (GNU tar) 1.15.1 were used.

The GNU tar manual of the Free Software Foundation.

The information on this page was last updated on 15 December 2009.
© Rainer Hinz.