Find – How to Use Wildcards for New Decade Files

findgnuwildcards

summary:

A given system has lots text files with names ~= [type of file].[8-digit date].
To search these files, I like (and wanna keep) using this idiom: find /path/ -name 'file.nnnn*' -print | xargs -e fgrep -nH -e 'text I seek' (where nnnn == 4-digit year)
… and in the past decade I also made find glob across years like find /path/ -name 'file.201[89]*' -print | xargs ...
… but now I can't make find glob across 2019 and 2020 with find /path/ -name 'file.20{19,20}*' -print | xargs ...
… although that "curly-brace globbing" (correct term?) works fine with ls!

Is there a {concise, elegant} way to tell find what I want, without instead doing post-find cleanup (i.e., what I'm doing now) à la

find /path/ -name 'file.*' -print | grep -e '\.2019\|\.2020' | xargs ...

? FWIW, I'd prefer a solution that works with xargs.

details:

I work on a system with lotsa conventions which long precede me and which I cannot change. One of those is, it has lotsa text files with names ~= [type of file].[8-digit date], e.g., woohoo_log.20191230. When searching within these files for some given text, I typically (as in, almost always) use the find ... grep idiom (often using Emacs' M-x find-grep). (FWIW, this is a Linux system with

$ find --version
find (GNU findutils) 4.4.2
...
$ bash --version
GNU bash, version 4.3.30(1)-release (x86_64-pc-linux-gnu)

and I currently lack status to change either of those, if I wanted to.) I often kinda know the year range of the matter-at-hand, and so will try to constrain what find returns (to speed processing), with (e.g.)

find /path/ -type f -name 'file.nnnn*' -print | xargs -e fgrep -nH -e 'text I seek'

where nnnn == 4-digit year. This WFM, and I like (and wanna keep) using the above idiom … especially since I can also use it to search across years like

find /path/ -type f -name 'file.201[89]*' -print | xargs ...

But this new decade seems to be breaking that idiom, and (to me at least) most oddly. (I wasn't here when the last decade changed.) Suppose I choose text that I know is in a file from 2019 && a file from 2020 (as in, I can open the files and see the text). If I currently do

find /path/ -name 'file.20{19,20}*' -print | xargs ...

grep unexpectedly/annoyingly finishes with no matches found, because

$ find /path/ -name 'file.20{19,20}*' -print | wc -l
0

But if I do

find /path/ -type f -name 'file.*' -print | grep -e '\.2019\|\.2020' | xargs ...

grep returns the expected results. Which is nice, but … ummm … that's just ugly, esp since this "curly-brace glob" (please correct me if this usage is incorrect or otherwise deprecated) works from ls! I.e., this shows me the files in the relevant year range (i.e., 2019..2020)

ls -al /path/file.20{19,20}*

Hence I'd like to know:

Am I just not giving find the right glob for this usecase? What do I need to tell find to make it do what ls is capably/correctly doing?
Is this a problem with xargs? If so, I can live with a find ... -exec solution, but … my brain works better with xargs, so I'd prefer to stay with that if possible. (Call me feebleminded, but -exec's syntax makes my brain hurt.)

Best Answer

With zsh, you could use recursive globbing and its <x-y> glob operator which matches on ranges of decimal numbers:

grep -nHFe 'text I seek' /path/**/file.<2019-2020>*(D-.)

(the (D) to also look into hidden (Dot) dirs as find would; presumably you can omit it if you don't want them, and -. is to restrict to regular file (.) identified after symlink resolution (-)).

Note that it would also match on file.00002020 (as that's a decimal number between 2019 and 2020) and like in your approach on file.20201234 as its file.2020 which matches file.<2019-2020> followed by 1234 which matches *.

The standard (POSIX sh and utilities) way to do it would be with:

find /path \( -name 'file.2019*' -o -name 'file.2020*' \) -type f \
  -exec grep -Fne 'text I seek' /dev/null {} +

(where adding /dev/null gets you the same effect as GNU grep's -H to force the file name to be displayed)

Note that the output of find -print is not compatible with the expected input format of xargs. With GNU utilities, you can use find -print0 and xargs -r0, but that's not needed as find -exec ... {} + has the same behaviour, is shorter and more portable.

Best Answer

Related Solutions

Zsh Recursive Glob – How to Exclude a Directory Name

Grep Command – Find All .tex Files in Directories Recursively

Related Question