Shell – How to show only total file size of particular extension by `du` command

disk-usageshellwildcards

I have a hundreds of pdf files and html files in a directory.
And I want to know total size of pdf files.

By command du -ch /var/foo I can see total file size but I only need last line, the total size.

If the directory contains only pdf files I can use -s option, but the option can't be used this time.

How can I get only total size of particular file type?

Best Answer

With GNU du (i.e. on non-embedded Linux or Cygwin), you can use the --exclude option to exclude the files you don't want to match.

du -s --exclude='*.html' /var/foo

If you want to positively match *.pdf files, you'll need to use some other method to list the files, and du will at least display one output line per argument, plus a grand total with the option -c. You can call tail to remove all but the last line, or sed to remove the word “total” as well. To enumerate the files in that one directory, use wildcards in the shell.

du -sc /var/foo/*.pdf | tail -n1
du -sc /var/foo/*.pdf | sed -n '$s/\t.*//p'

If you need to traverse files in subdirectories as well, use find, or use a **/ pattern if your shell supports that. For **/, in bash, first run shopt -s extglob, and note that bash versions up to 4.2 will traverse symbolic links to directories; in zsh, this works out of the box.

du -sc /var/foo/**/*.pdf | tail -n1

An added complication with the find version is that if there are too many files, find will run du more than once, to keep under the command line length limit. With the wildcard method, you'll get an error if that happens (“command line length limit exceeded”). The following code assumes that you don't have any matching file name containing a newline.

find /var/foo -name '*.pdf' -exec du -sc {} + |
awk '$2 == "total" {total += $1} END {print total}'
Related Question