Ubuntu – Confused about grep and the * wildcard

command linefindgrep

I am running the following command in order to find all files/directories that do not have anything to do with "flash_drive_data":

find . -not -path './flash_drive_data*' | grep "./*flash*"

There are a few things which I tried that are confusing me:

1. When I run the above command, I get a few "partial" hits (i.e they do not completely match the *flash* pattern. For example:

./.local/lib/python3.7/site-packages/jedi/third_party/typeshed/third_party/2and3/flask
./.local/lib/python3.7/site-packages/jedi/third_party/typeshed/third_party/2and3/flask/cli.pyi
./.local/lib/python3.7/site-packages/jedi/third_party/typeshed/third_party/2and3/flask/signals.pyi
./.local/lib/python3.7/site-packages/jedi/third_party/typeshed/third_party/2and3/flask/templating.pyi
./.local/lib/python3.7/site-packages/jedi/third_party/typeshed/third_party/2and3/flask/sessions.pyi
./.local/lib/python3.7/site-packages/jedi/third_party/typeshed/third_party/2and3/flask/json
./.local/lib/python3.7/site-packages/jedi/third_party/typeshed/third_party/2and3/flask/json/tag.pyi

The 3/flas at the end is being highlighted.

2. When I replaced grep "*flash*" with just grep "*", I expected to get all files returned by find, but I got none. Why? Then, when I did grep "**" I believe I got all the files (or at least I think I did). Again, why is that?

3. Finally, the objective of what I was doing above was to make sure that when I ran find . -not -path './flash_drive_data*' I was getting nothing related to flash_drive_data. It seemed like I did (with some unexpected behavior with grep as I explained above). However, when I ran:
find . -not -path './flash_drive_data*' -exec tar cfv home.tar.bz '{}' +

I was getting output including things like:

./flash_drive_data/index2/ask-sdk-core/dist/dispatcher/error/handler/

so flash_drive_data files were being included.

Best Answer

find . -not -path './flash_drive_data*' | grep "./*flash*"

The thing here is that grep uses regular expressions, while find -path uses shell glob style pattern matches. The asterisk has a different meaning in those two.

The regular expression ./*flash* matches first any character (.), then zero or more slashes (/*), then a literal string flas, then any number (zero or more) of h characters. 3/flas matches that (with zero times h), and so would e.g. reflash (with zero times /).

You could just use grep flash instead, given that it matches anywhere in the input, so leading and tailing "match anything" parts are unnecessary.

Or use find -path './*flash*' -and -not -path './flash_drive_data*'

When I replaced grep "*flash*" with just grep "*", I got [no matches].

Since the asterisk means "any number of the previous atom", it's not really well defined here. grep interprets that as a literal asterisk, but really it should be an error.

However, when I ran: find . -not -path './flash_drive_data*' -exec tar cfv home.tar.bz '{}' + I was getting output including things like:

./flash_drive_data/index2/ask-sdk-core/dist/dispatcher/error/handler/

so flash_drive_data files were being included.

Note that tar stores files recursively, and the first output of that find is . for the current directory, so everything will be stored. You may want to use ! -type d with find to exclude directories from the output, or (better), look at the -exclude=PATTERN options to tar.

Related Question