Bash – Difference Between ‘du -sh *’ and ‘du -sh ./*’

bashfilenamesoptionsshellwildcards

What's the difference between du -sh * and du -sh ./* ?

Note: What interests me is the * and ./* parts.

Best Answer

$ touch ./-c $'a\n12\tb' foo
$ du -hs *
0       a
12      b
0       foo
0       total

As you can see, the -c file was taken as an option to du and is not reported (and you see the total line because of du -c). Also, the file called a\n12\tb is making us think that there are files called a and b.

$ du -hs -- *
0       a
12      b
0       -c
0       foo

That's better. At least this time -c is not taken as an option.

$ du -hs ./*
0       ./a
12      b
0       ./-c
0       ./foo

That's even better. The ./ prefix prevents -c from being taken as an option and the absence of ./ before b in the output indicates that there's no b file in there, but there's a file with a newline character (but see below1 for further digressions on that).

It's good practice to use the ./ prefix when possible, and if not and for arbitrary data, you should always use:

cmd -- "$var"

or:

cmd -- $patterns

If cmd doesn't support -- to mark the end of options, you should report it as a bug to its author (except when it's by choice and documented like for echo).

There are cases where ./* solves problems that -- doesn't. For instance:

awk -f file.awk -- *

fails if there is a file called a=b.txt in the current directory (sets the awk variable a to b.txt instead of telling it to process the file).

awk -f file.awk ./*

Doesn't have the problem because ./a is not a valid awk variable name, so ./a=b.txt is not taken as a variable assignment.

cat -- * | wc -l

fails if there a file called - in the current directory, as that tells cat to read from its stdin (- is special to most text processing utilities and to cd/pushd).

cat ./* | wc -l

is OK because ./- is not special to cat.

Things like:

grep -l -- foo *.txt | wc -l

to count the number of files that contain foo are wrong because it assumes file names don't contain newline characters (wc -l counts the newline characters, those output by grep for each file and those in the filenames themselves). You should use instead:

grep -l foo ./*.txt | grep -c /

(counting the number of / characters is more reliable as there can only be one per filename).

For recursive grep, the equivalent trick is to use:

grep -rl foo .//. | grep -c //

./* may have some unwanted side effects though.

cat ./*

adds two more character per file, so would make you reach the limit of the maximum size of arguments+environment sooner. And sometimes you don't want that ./ to be reported in the output. Like:

grep foo ./*

Would output:

./a.txt: foobar

instead of:

a.txt: foobar

Further digressions

1. I feel like I have to expand on that here, following the discussion in comments.

$ du -hs ./*
0       ./a
12      b
0       ./-c
0       ./foo

Above, that ./ marking the beginning of each file means we can clearly identify where each filename starts (at ./) and where it ends (at the newline before the next ./ or the end of the output).

What that means is that the output of du ./*, contrary to that of du -- *) can be parsed reliably, albeit not that easily in a script.

When the output goes to a terminal though, there are plenty more ways a filename may fool you:

  • Control characters, escape sequences can affect the way things are displayed. For instance, \r moves the cursor to the beginning of the line, \b moves the cursor back, \e[C forward (in most terminals)...

  • many characters are invisible on a terminal starting with the most obvious one: the space character.

  • There are Unicode characters that look just the same as the slash in most fonts

     $ printf '\u002f \u2044 \u2215 \u2571 \u29F8\n'
     / ⁄ ∕ ╱ ⧸
    

(see how it goes in your browser).

An example:

$ touch x 'x ' $'y\bx' $'x\n0\t.\u2215x' $'y\r0\t.\e[Cx'
$ ln x y
$ du -hs ./*
0       ./x
0       ./x
0       ./x
0       .∕x
0       ./x
0       ./x

Lots of x's but y is missing.

Some tools like GNU ls would replace the non-printable characters with a question mark (note that (U+2215) is printable though) when the output goes to a terminal. GNU du does not.

There are ways to make them reveal themselves:

$ ls
x  x   x?0?.∕x  y  y?0?.?[Cx  y?x
$ LC_ALL=C ls
x  x?0?.???x  x   y  y?x  y?0?.?[Cx

See how turned to ??? after we told ls that our character set was ASCII.

$ du -hs ./* | LC_ALL=C sed -n l
0\t./x$
0\t./x $
0\t./x$
0\t.\342\210\225x$
0\t./y\r0\t.\033[Cx$
0\t./y\bx$

$ marks the end of the line, so we can spot the "x" vs "x ", all non-printable characters and non-ASCII characters are represented by a backslash sequence (backslash itself would be represented with two backslashes) which means it is unambiguous. That was GNU sed, it should be the same in all POSIX compliant sed implementations but note that some old sed implementations are not nearly as helpful.

$ du -hs ./* | cat -vte
0^I./x$
0^I./x $
0^I./x$
0^I.M-bM-^HM-^Ux$

(not standard but pretty common, also cat -A with some implementations). That one is helpful and uses a different representation but is ambiguous ("^I" and <TAB> are displayed the same for instance).

$ du -hs ./* | od -vtc
0000000   0  \t   .   /   x  \n   0  \t   .   /   x      \n   0  \t   .
0000020   /   x  \n   0  \t   . 342 210 225   x  \n   0  \t   .   /   y
0000040  \r   0  \t   . 033   [   C   x  \n   0  \t   .   /   y  \b   x
0000060  \n
0000061

That one is standard and unambiguous (and consistent from implementation to implementation) but not as easy to read.

You'll notice that y never showed up above. That's a completely unrelated issue with du -hs * that has nothing to do with file names but should be noted: because du reports disk usage, it doesn't report other links to a file already listed (not all du implementations behave like that though when the hard links are listed on the command line).

Related Question