How to get the distribution of file sizes

filessize;

I'd like to know the distribution of file sizes under a certain directory.

Please note: distribution of file sizes, not size of a directory. That means I want to know there are 25 files of 60 bytes, 50 files of 12587 bytes, 2 files of 57kbytes, and so on.

Bonus points if the data could be gathered via command line (eg. on a remote system) in a format easily useable to produce graphs.

Best Answer

List the files, extract the size in bytes from the list, sort it and count the occurrence of every size:

find /my/directory -type f -exec ls -l {} + | cut -d' ' -f5 | sort -n | uniq -c
  • not terribly efficient
  • if there are many many files it may be better to save intermediate results in a temp file, sort it to another temp file, then "uniq" it
  • here I use numeric sort so the output will be ordered by ascending file size (nice), but any sort will do as long as equal lines are grouped together
  • pipe the results in awk '{ print $1 "," $2 }' to get a CSV file to be used in your graphing tool of choice (even spreasheet tools will do)
Related Question