Linux – Generate distribution of file sizes from the command prompt

bashcommand linelinuxunix

I've got a filesystem which has a couple million files and I'd like to see a distribution of file sizes recursively in a particular directory. I feel like this is totally doable with some bash/awk fu, but could use a hand. Basically I'd like something like the following:

1KB: 4123
2KB: 1920
4KB: 112
...
4MB: 238
8MB: 328
16MB: 29138
Count: 320403345

I feel like this shouldn't be too bad given a loop and some conditional log2 filesize foo, but I can't quite seem to get there.

Related Question: How can I find files that are bigger/smaller than x bytes?.

Best Answer

This seems to work pretty well:

find . -type f -print0 | xargs -0 ls -l | awk '{size[int(log($5)/log(2))]++}END{for (i in size) printf("%10d %3d\n", 2^i, size[i])}' | sort -n

Its output looks like this:

         0   1
         8   3
        16   2
        32   2
        64   6
       128   9
       256   9
       512   6
      1024   8
      2048   7
      4096  38
      8192  16
     16384  12
     32768   7
     65536   3
    131072   3
    262144   3
    524288   6
   2097152   2
   4194304   1
  33554432   1
 134217728   4
where the number on the left is the lower limit of a range from that value to twice that value and the number on the right is the number of files in that range.

Related Question