Strange discrepancy of file sizes from ls

lssize;

I've been using ls -sh to check file sizes ever since 1997 or so, but today something strange happened:

ninja@vm:foo$ ls -sh
total 98M
1,0M app   
64M app_fake_signed.sbp  
800K loader  
804K loader_fake_signed.sbp  
1,0M web   
32M web_fake_signed.sbp

The app and web files were not supposed to be much smaller than their signed counterparts, and I spent several hours debugging the signing program. After finding nothing, by chance I happened to look at the files in a Samba share, to find them very similar in size. I checked again:

ninja@vm:foo$ ls -lh
total 98M
-rw-rw-r-- 1 ninja ninja  63M lut  4 14:13 app
-rw-rw-r-- 1 ninja ninja  64M lut  4 14:13 app_fake_signed.sbp
-rw-rw-r-- 1 ninja ninja 800K lut  4 14:13 loader
-rw-rw-r-- 1 ninja ninja 801K lut  4 14:13 loader_fake_signed.sbp
-rw-rw-r-- 1 ninja ninja  31M lut  4 14:13 web
-rw-rw-r-- 1 ninja ninja  32M lut  4 14:14 web_fake_signed.sbp

I'm speechless? Why does ls -s show the app and web to be 1MB in size, while they are actually 63 and 32MB, respectively?

This was Xubuntu 14.04 running in VirtualBox on Windows, if it makes any difference.

Edit:
The files app, web and loader are all created by a bash script (not of my design) which runs dd if=/dev/urandom of=app bs=$BLOCK count=1 seek=... in a loop. The signing program, written in C, takes these files and writes their signed versions to the disk, prepending and appending a binary signature to each.

Best Answer

You're using the -s option to ls.

A file's size and the amount of disk space it takes up may differ. Consider for example, if you open new file, seek 1G into it, and write something, the OS doesn't allocate 1G (plus the space for something) on disk, it allocates only the same for something -- this is called a "sparse file".

I wrote a small C program to create such a file:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main(void)
{
    int fd = open("/tmp/foo.dat", O_CREAT | O_WRONLY, 0600);

    if (fd > 0) {
        const off_t GIG = 1024 * 1024 * 1024;

        // Seek 1G into the file
        lseek(fd, GIG, SEEK_SET);

        // Write something
        write(fd, "hello", sizeof "hello");

        close(fd);
    }

    return 0;
}

Running that program I get:

$ ls -lh /tmp/foo.dat
-rw------- 1 user group 1.1G Feb  4 15:25 /tmp/foo.dat

But using -s, I get:

$ ls -sh /tmp/foo.dat
4.0K /tmp/foo.dat

So a 4K block was allocated on disk to store "hello" (and 4K is the smallest unit of allocation for my filesystem).

In your case, it looks like app and web are such sparse files.

Related Question