SUMMARY: ls -l and du -k report very different size for same file

From: Charles Vachon <cvachon_at_mrn.gouv.qc.ca>
Date: Mon, 06 Apr 1998 12:03:17 -0400

Many thanks to the following people who brought quick answers to my
question (posted at the end of this message):

Jim Belonis <belonis_at_dirac.phys.washington.edu>
alan_at_nabeth.cxo.dec.com (Alan Rollow - Dr. File System's Home for Wayward
Inodes.)
Brian_ONeill_at_uml.edu (Brian O'Neill)
Jerome M Berkman <jerry_at_uclink.berkeley.edu>
Lucien_HERCAUD_at_paribas.com
Girish Phadke <PGIRISH_at_binariang.maxisnet.com.my>

All explanations pointed to the same thing: SPARSE FILES.

Here is what fellow DU-ers had to say regarding these:

Jim Belonis:

Some Unix filesystems support files with 'holes'.
I.e. the file size reported by ls may include bytes that are not really there.
That is, you can create a file with one byte at seek-location 2000,
and the file size indicated by ls -l would be 2000000, but there is actually
only one block allocated out at byte 2000000.
So du would show size of one block.
You should be able to verify whether this explains the situation
by using some kind of inode examination program.

Most programs will treat the file as if all the missing blocks are nulls
which explains your cat results.

This kind of file is mainly used for sparse databases where you want the
byte number in the file to be related simply to the record number,
but you don't have very many record numbers, but the desired record numbers
have a wide range (like social security numbers for a small number of
employees.
So you can save lots of disk space by not bothering to allocate disk blocks
for record numbers you don't use.

***************
Alan Rollow:

They're called sparse files and it is likely that every UNIX
supports them. Quite simply, the file systems don't allocate
blocks to some of the file space. This space reads as NULs
and gets space allocated to it when writing. Some applications
like to take advantage of the feature, other abhor it. You can
find out how to create a sparse file by reading the lseek(2)
manual page.

***************
Brian O'Neill:

They are called "sparse" files. UNIX can save on disk space by not
bothering to allocate disk blocks for data blocks consisting of all
zeros. Certain operations, like the one you performed, WILL allocate
those blocks.

Crash dumps are often sparse files.

***************
Jerome M Berkman:

You have files with unallocated blocks. Core dumps often appear
that way. If you actually use the block, it has all null bytes.
That is why copying the small file created a big file. Just
remember that "du" gives the offset of the file, "ls -l" just
gives the address of the last byte of the file.

****************
Lucien HERCAUD:

Bonjour Charles,

This is what we call - on UNIX - a SPARSE file.
Inside the file, there are HOLES that do not really occupy space on
disk but which read as zero's.

To create such kind of files, the lseek() system call is used and the
offset is specified PAST THE END of the file (see the man page for
lseek(2)).

So "ls -l" will return the number of bytes one can read in the file
(the same as "wc") including the holes, while "du" will look to the
file inode to return the actual disk usage for that file.

Of course, when you copy that file with "cat" into another one, the
resulting file is created without holes and that is why you see it's
full size in both cases.

To create a sparse file with utilities, you can use the "dd
conv=sparse" command (the holes sizes will be dependent of the number
of zeros in the input and the "obs=" - or the "bs=" - you specify on
dd's command line ; instead of writing a sequence of "obs" zeros
inside the output file, dd will do lseek())

Amicalement,

/*****************************/
/ ORIGINAL MESSAGE FOLLOWS /
/*****************************/

Hello DU managers,

Doing a little bit of housekeeping on a DU 4.0b server I newly
administrate, I encoutered a few files right off the root directory (an
ADVFS filesystem). These files consist of only binary zeroes (nulls), and
I'm told they are of no use anymore. I was about to delete these when I
found something strange:

ls -l 4* reports files of multi-megabyte size :

-rw------- 1 root daemon 10469376 Dec 23 1996 41988
-rw------- 1 root daemon 8806400 Dec 23 1996 44330

but du reports very different sizes for these same files:

du -k 4* gives:

8 41988
8 44330

du reports a consumption of only 8 1024-bytes blocks for the same files!

I did an experiment with one of these files. I did a "cat 44330 one >
44444", where one is a file containing a single byte. This created a 44444
file with characteristics one would expect:

ls -l 44444 gives:
-rw-r--r-- 1 root system 8806401 Apr 3 14:33 44444

and du -k 44444 says:
8608 44444

both commands report a consistent file size, since 8608 1024-byte blocks is
roughly the same size as 8806401 bytes.

I figured that maybe there is a special way to treat files in DU/ADVFS
which are containing all nulls. So I did another experiment:
"dd if=/dev/zero of=/tmp/test.file bs=1024 count=50". This creates an
all-nulls files of 51200 bytes:

-rw-r--r-- 1 root system 51200 Apr 3 15:04 test.file

du -k reports a block count consistent with this file size, even though it
contains only nulls:

50 effa.cv1

I'm at a lost trying to explain what I see. Does anyone have an idea of
what could explain this discrepancy?

Thanks in advance!


--
Charles Vachon -- Administrateur de système
Fonds de la réforme cadastrale du Québec
Ministère des Ressources Naturelles du Québec
cvachon_at_mrn.gouv.qc.ca
Received on Mon Apr 06 1998 - 18:09:36 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:37 NZDT