A comparison of hard links and soft links:
A soft link (or symbolic link) is a pointer to the real file. Symbolic links can cross filesystems (mount points). All soft links become useless when the original file they point to is removed/deleted.
Hard links, unlike soft links, appear as a real copy of the file itself. Hard links may not cross filesystems. Only when the last hard link is removed is a file truely removed from the filesystem.
Q. So why hard links rather than soft links?
A. An example some of the benefits is as follows:
Imagine a daily snapshot system that puts each days backup in its own directory, here we have September 14th through September 18th:
20040914
20040915
20040916
20040917
20040918
Keep in mind that this takes up the space of one full backup. All other days worth only take up the space equivalent to differential backups.
For simplicity, let's say we only want 5 days worth of backups, if it is older than 5 days let's just dump it (in a real scenario this might be 6 months or a year - or not at all).
The next time our backup runs, we would drop off anything older than 5 days (20040914) and the next backup would go into 20040919. If this method were using soft links then our original content would have just disappeared when we dropped 20040914. This would make all of our backups (if using soft links) useless.
To the contrary, since we are using hard links, then we loose nothing when we drop off 20040914. All the other days are unaffected at all.
The next day 20040919 is then backed up and hard linked (when applicable) to the previous day (20040918). Anything that has changed is obviously not linked to the previous day, but rather a fresh copy of this changed file is stored.
This should give you an example of how a rotating snapshot system could work. Each day's directory is just a different "view" to the same data set. Of course in reality, you will realize such space savings with this method that it may be unnecessary to drop off outdated directories.
Snapshotting is the method of creating backups at regular intervals that are hard linked to the previous backup in an effort to save space.
Snapshotting combines the benefits of full backups with those of differential backups.
Full backup Positives:
Complete backup
only need to get the 1 archive to do a restore
Full backup Negatives:
Differential backups Positives:
Differential backups Negatives:
Restores start with the last full backup and then proceed through all the differential backups until the desired point in time.
Restores take longer than a full backup due to the complexity of a restore.
If any of the previous backups are unavailable then the restore will fail.
More on hard links:
You can take advantage of hard linking when 2 files are exactly the same (i.e. no differences).
Often checking md5sum and the file sizes can help with this determination.
2 files that are hard linked will have different filenames (or same name in a different directory which is really a different name)
take up space on the hard drive only once
show up as regular files
have the exact same inode for each file that is hard linked
has a hard link counter that is increased for each hard link
Hard linking is the linking of 2 duplicate files together so that the 2 different names point to the same contents - i.e. the contents are stored once and pointed to twice.
Hard links appear as regular files, share the same inode, and the inode counter is incremented for each hard link.
Soft links can cross filesystems (mount points) and appear as a symbolic link (pointer).
snapshot
hard links
rsync -H
cp -al
full backups / differential backups
not cross filesystem
deletions ok, modifications not ok
not linked that should be - reconcilliation.
In some cases you might have files that should be hard linked together but aren't. This can happen when your snapshotting technology isn't working just right or if you have many backups from many different systems, but would like to consume less space.
I have written, but not thoroughly tested, the following reconciliation script. It will search the current directory and below for any files that it feels should be hard linked together. It can use some improvements, but I believe it is usable as is so far. I have run it once on my system and I believe it to have done a good job (I reduced from 17G to 13GB).
Search this Site!: