Hi Admin Wizards,
A big thanx to the list - once again, helpful responses within hours (or
even minutes).
The best solution for my problem was (1), but I the other responses are
highly appreciated - I learned much from these.
Credits go to (in order of appearance):
James Sainsbury, Joerg Bruehe, Alan_at_Nabeth, Charles Ballowe
Solutions:
-----------------------------------------------------------------------
1) make fdupes "makeable" (by James Sainsbury):
The reason the compile doesn't proceed is that the header and object files
for getopt_long() do not exist on tru64.
One solution:
goto your source/compile directory for gnu tar (1.13)
look in lib for
getopt.h
getopt.o
getopt1.o
copy these to your fdupes compile directory
edit the fdupes Makefile
....
#EXPERIMENTAL_RBTREE = -DEXPERIMENTAL_RBTREE
INCLUDES=-I.
LIBES=getopt.o getopt1.o
DEBUG=-O2
CFLAGS=$(INCLUDES) $(LIBES) $(DEBUG)
....
fdupes: fdupes.c md5/md5.c
$(CC) fdupes.c md5/md5.c $(CFLAGS) -o fdupes
-DVERSION=\"$(VERSION)\" $(
EXTERNAL_MD5) $(EXPERIMENTAL_RBTREE)
----------------------------------------------------------------------------
-----------
2) Shellscript that could do the job (Charles Ballowe):
something like:
find /directory-of-big-storage -exec cksum {} >> /tmp/output \;
sort -n /tmp/output > /tmp/output.sorted
and then something to go over and compare the first two fields of
each line with the previous line to determine if the files are likely
to be the same. It's going to take a while and beat on the disk a bit,
but it will get the job done. (and since it's sorted output - you only
have to worry about the lines next to the current line.)
-----------------------------------------------------------------------
3) Joerg Bruehe offered a script he has:
I have a tool (shell script) that traverses two trees, "old"
and "new", gets the file names from "old", looks for a file
with identical name in "new", compares them, and (if equal
contents) replaces the file in "new" by a hardlink to that
in "old".
(Mail me if you want that script.)
Once things were even worse: We had changed several names.
In that moment I did (unchecked - from memory):
cd common_tree_root
find . -type f -print | \
xargs ls -ldi | sort -n +6 -n +1 > list
where the "sort" is on the size field first, inode second.
>From there I proceeded manually ("cmp" on files with same
size but different inodes, possibly followed by "ln -f"),
but this might be a start point for automation.
----------------------------------------------------------------------------
-----
4) and Alan_at_Nabeth suggested to:
1. Get a list of all the regular file names on the
target file system:
find /file-system -type f -print > list
2. Run the checksum program against each file. This
will give you tuples of checksum, size and file
name. Files with the size and checksum may be the
same.
xargs < list sum > checksums
The next substep is a bit more complicated. You want
to compare the checksums and sizes. You can almost do
this with sort and uniq, but since the name is part of
the line everything will be different. A custom awk
or perl script might do the job.
3. For files that had the same size (non-zero) and checksum,
compare them with diff or cmp. The checksum calculation
with sum(1) is only 16 bits. You can have files with the
same checksum that are different. cksum(1) uses a 32 bit
checksum, which should give a better first cut separation
before having to do detailed compares.
----------------------------------------------------------------------------
---------
YS, CW
------------------------------------------
Dr. Christian Wessely
christian.wessely_at_uni-graz.at
url: www-theol.uni-graz.at
Received on Fri Jun 14 2002 - 06:46:06 NZST