List Basenames in Directory – Sort by Modification Date

parameterstringzsh

Given a directory containing:

  • note 1.txt, last modified yesterday
  • note 2.txt, last modified the day before yesterday
  • note 3.txt, last modified today

What is the best way to fetch the array note 3 note 1 note 2?

To define "best," I'm more concerned about robustness (in the context of ZSH in macOS) than I am about efficiency and portability.

The intended use case is a directory of hundreds or thousands of plain text files, but—at the risk of muddling the question—this is a specific case of a more general question I have, of what best practices are in performing string manipulations on filepaths printed by commands like ls, find, and mdfind.


I've been using a macro which invokes this command to achieve the above:

ls -t | sed -e 's/.[^.]*$//'

It's never failed, but:

  • Greg's Wiki strongly recommends against parsing the output of ls. (Parsing ls; Practices, under "5. Don't Ever Do These").
  • Is invoking sed inefficient where parameter expansion would do?

Using find (safely delimiting filepaths with NUL characters rather than newlines), and parameter expansion to extract the basenames, this produces an unsorted list:

find . -type f -print0 | while IFS= read -d '' -r l ; do print "${${l%.*}##*/}" ; done

But sorting by modification date would seem to require invoking stat and sort, because macOS's find lacks the -printf flag which might otherwise serve well.

Finally, using ZSH's glob qualifiers:

for f in *(om) ; do print "${f%.*}" ; done

Though not portable, this last method seems most robust and efficient to me. Is this correct, and is there any reason I shouldn't use a modified version of the find command above when I'm actually performing a search rather than simply listing files in a directory?

Best Answer

In zsh,

list=(*(Nom:r))

Is definitely the most robust.

print -rC1 -- *(Nom:r)

to print them one per line, or

print -rNC1 -- *(Nom:r)

as NUL-delimited records to be able to do anything with that output since NUL is the only character not allowed in a file path.

Change to *(N-om:r) if you want the modification time to be considered after symlink resolution (mtime of the target instead of the symlink like with ls -Lt).

:r (for root name) is the history modifier (from csh) to remove the extension. Beware that it turns .bashrc into the empty string which would only be a concern here if you enabled the dotglob option.

Change to **/*(N-om:t:r) to do it recursively (:t for the tail (basename), that is, to remove the directory components).

Doing it reliably for arbitrary file names with ls is going to be very painful.

One approach could be to run ls -td -- ./* (assuming the list of file names fits in the arg list limit) and parse that output, relying on the fact that each file names starts with ./, and generate either a NUL-delimited list or a shell-quoted list to pass it to the shell, but doing that portably is also very painful unless you resort to perl or python.

But if you can rely on perl or python being there, you would be able to have them generate and sort the list of files and output it NUL-delimited (though possibly not that easily portably if you want to support sub-second precision).

ls -t | sed -e 's/.[^.]*$//'

Would not work properly for filenames that contain newline characters (IIRC some versions of macOS did ship with such filenames in /etc by default). It could also fail for file names that contain sequence of bytes not forming valid characters as . or [^.] could fail to match on them. It may not apply to macOS though, and could be fixed by setting the locale to C/POSIX for sed.

The . should be escaped (s/\.[^.]*$//) as it's the regexp operator that matches any character as otherwise, it turns dot-less files like foobar into empty strings.

Note that to print a string raw, it's:

print -r -- "$string"

print "$string" would fail for values of $string that start with -, even introducing a command injection vulnerability (try for instance with string='-va[$(uname>&2)1]', here using a harmless uname command). And would mangle values that contain \ characters.

Your:

find . -type f -print0 | while IFS= read -d '' -r l ; do print "${${l%.*}##*/}" ; done

Also has an issue in that you strip the .* before removing the directory components. So for instance a ./foo.d/bar would become foo instead of bar and ./foo would become the empty string.

About safe ways to process the find output in various shells, see Why is looping over find's output bad practice?