I work on a cluster shared with other colleagues. The hard disk is limited (and has been full on some occasions), so I clean up my part occasionally. I want to do this quickly, so until now I do this by making a list of files larger than 100 MB older than 3 months, and I see if I still need them.
But now I am thinking that there could be a folder with >1000 smaller files that I miss, so I want to get an easy way to see if this is the case. From the way I generate data, it would help to get a list of total size per extension. In the context of this question, 'extension' as everything behind the last dot in the filename.
Suppose I have multiple folders with multiple files:
folder1/file1.bmp 40 kiB
folder1/file2.jpg 20 kiB
folder2/file3.bmp 30 kiB
folder2/file4.jpg 8 kiB
Is it possible to make a list of total filesize per file extension, so like this:
bmp 70 kiB
jpg 28 kiB
I don't care about files without extension, so they can be ignored or put in one category.
I already went through man pages of ls
, du
and find
, but I don't know what is the right tool for this job…
Best Answer
On a GNU system:
Or the same with
perl
, avoiding the-printf
extension of GNUfind
(still using a GNU extension,-print0
, but this one is more widely supported nowadays):It gives an output like:
If you want
KiB
,MiB
... suffixes, pipe tonumfmt --to=iec-i --suffix=B
.%b*512
gives the disk usage, but note that if files are hard linked several times, they will be counted several times so you may see a discrepancy with whatdu
reports.