relink

NAME

relink SYNOPSIS

relink [options] [file]... DESCRIPTION

There are a number of options. One really useful option is "-i", (inhibit), which causes relink to not do any relinking at all, but to just look for the duplicates. If combined with the "-v" (verbose) option, a list of all the duplicates is produced. The "-r" (recursive) option means to go into directories. The "-s" option restricts the work to files of or more bytes.

When identical files are found, a decision must be made as to which is chosen as the "real" one to be preserved, and which is to be replaced by a link to the other.

If 2 files are the same file (eg. inodes ==), then they are already linked. The compared list element is removed and the number of files is then decremented.

We then read and compare the 2 files, and if they are identical, we unlink one of the files, and establish a hard link to the "real" one. The criteria for unliking is as follows:

If one is multiply linked and the other is singly linked, the singly linked file is discarded and the multiply-linked one has a new link.

If both are multiply linked, the newer one is discarded and becomes a link to the older. However, if the "-n" (newer) option is used, the newer one will be the chosen file, and the older will be replaced by a link.

This program isn't guaranteed to be totally successful. If two files exist, each with several links, it is possible that not all of the links to one will be replaced by a link to the other. This depends on what order they are discovered. To catch such cases, repeat the relink command. In actual experience, such misses happen less than one time per 1000 files. Correcting this requires an algorithm that is much more complicated, and probably not worth the bother.

This program uses John Chambers' dbg package for verbosity; you'll need a copy of dbg to compile it. Or you can replace the D*() and P*() macros with your own favorite wrappers for fprintf().

OPTIONS

-i: Inhibit; don't do any linking. Usually combined with -v option.
-vN: Verbose. If N is present, it is a 1-digit verbose level. Any values above 1 (the default if omitted) probably requires the source to make sense.
-n: Newer files are chosen over older. The default is to chose the older of two files, to minimize work done by make(1).
-r: Recurse through directories.
-sN: Size minimum. Files of less than N bytes are ignored. The default is 1.

EXAMPLES

relink -vr: Search the current directory and subdirectories for identical files, and link any duplicates found. Output a list of the links done.
relink -virs1000 /: Search the entire file system for files greater than 999 bytes, and produce a list of pairs of files that are identical. This list will contain candidates for linking or deleting. If you repeat the command without the "i" in the option list, it will do the relinking for you.
relink -nrv src: Search the src directory and all subdirectories, linking any duplicates found. For singly-linked files, the newer of the two will be preserved.
relink /usr/lib/*.a: Examine the archive files in /usr/lib, and if there are any duplicates, replace them with links. Subdirectories will not be searched.

BUGS

All output is written to stderr. Actually, it is written to the dbgout stream, which default to stderr. Maybe selected messages should be sent to stdout instead. AUTHOR

John Chambers

Recursion added by Gerry Feldman.

Ported to random computers by John and Gerry. Your mileage may vary.