I have a directory filled with files with names like logXX
where XX is a two-character, zero-padded, uppercase hex number such as:
log00
log01
log02
...
log0A
log0B
log0C
...
log4E
log4F
log50
...
Generally there will be fewer than say 20 or 30 files total. The date and time on my particular system is not something that can be relied up on (an embedded system with no reliable NTP or GPS time sources). However the filenames will reliably increment as shown above.
I wish to grep
through all the files for the single most recent log entry of a certain type, I was hoping to cat
the files together such as…
cat /tmp/logs/log* | grep 'WARNING 07 -' | tail -n1
However it occurred to me that different versions of bash
or sh
or zsh
etc. might have different ideas about how the *
is expanded.
The man bash
page doesn't say whether or not the expansion of *
would be a definitely ascending alphabetical list of matching filenames. It does seem to be ascending every time I've tried it on all the systems I have available to me — but is it DEFINED behaviour or just implementation specific?
In other words can I absolutely rely on cat /tmp/logs/log*
to concatenate all my log files together in alphabetical order?
Best Answer
In all shells, globs are sorted by default. They were already by the
/etc/glob
helper called by Ken Thompson's shell to expand globs in the first version of Unix in the early 70s (and which gave globs their name).For
sh
, POSIX does require them to be sorted by way ofstrcoll()
, that is using the sorting order in the user's locale, like forls
though some still do it viastrcmp()
, that is based on byte values only.You may notice above that for those shells that do sorting based on locale, here on a GNU system with a
en_GB.UTF-8
locale, the-
in the file names is ignored for sorting (most punctuation characters would). Theó
is sorted in a more expected way (at least to British people), and case is ignored (except when it comes to decide ties).However, you'll notice some inconsistencies for log① log②. That's because the sorting order of ① and ② is not defined in GNU locales (currently; hopefully it will be fixed some day). They sort the same, so you get random results.
Changing the locale will affect the sorting order. You can set the locale to C to get a
strcmp()
-like sort:Note that some locales can cause some confusions even for all-ASCII all-alnum strings. Like Czech ones (on GNU systems at least) where
ch
is a collating element that sorts afterh
:Or, as pointed out by @ninjalj, even weirder ones in Hungarian locales:
In
zsh
, you can choose the sorting with glob qualifiers. For instance:The numeric sort of
echo *(n)
can also be enabled globally with thenumericglobsort
option:If you (as I was) are confused by that order in that particular instance (here using my British locale), see here for details.