Command Line – How Grep Knows When Run as Part of Glob Expansion

bashcommand linewildcards

As per my understanding, a glob wildcard is interpreted by the shell, which then runs the given command for each matching filename. Suppose I have files: abc1, abc2, and abc3 in my current directory. Then, for example, echo abc* will echo once for each filename starting with 'abc'.

However, if I run grep 'foo' abc*, I imagine this should run:

grep 'foo' abc1
grep 'foo' abc2
grep 'foo' abc3

Which means i should get the following output (assuming all files contain one line that says 'foo'):

foo
foo
foo

However, instead I get:

abc1:foo
abc2:foo
abc3:foo

So I figure there are 2 possible explanations for this. First, is somehow grep can detect that it was used with a glob expressions and responds by outputting the filenames before the matches. Second, since you can pass multiple files to grep, the shell actually runs only 1 command:

grep 'foo' abc1 abc2 abc3

However, this only works because grep accepts multiple files at the end. It is possible that another command would only allow 1 file to be passed in. So if you wanted to run the command for multiple files matching the glob, it wouldn't work if globbing worked via the second method described above.

Anyways, can someone shed some light on this?

Thanks!

Best Answer

That's the trick: command doesn't know, it's the shell that does the job

Consider for example grep 'abc' *.txt. If we run trace of system calls, you will see something like this:

bash-4.3$ strace -e trace=execve grep "abc" *.txt > /dev/null
execve("/bin/grep", ["grep", "abc", "ADDA_converters.txt", "after.txt", "altera_license.txt", "altera.txt", "ANALOG_DIGITAL_NOTES.txt", "androiddev.txt", "answer2.txt", "answer.txt", "ANSWER.txt", "ascii.txt", "askubuntu-profile.txt", "AskUbuntu_Translators.txt", "a.txt", "bash_result.txt", ...], [/* 80 vars */]) = 0
+++ exited with 0 +++

The shell expanded *.txt into all filenames in current directory that end with .txt extension. So effectively, your shell translates the grep 'abc' *.txt command into grep 'abc' file1.txt file2.txt file3.txt . . .. Thus, your second assumption is correct.

First assumption is not correct - programs have no way of detecting glob. It is possible to pass * as string argument to command, but it's the command's job to decide what to do with it then. Filename expansion, however, is property of your respective shell as I've already mentioned.

However, this only works because grep accepts multiple files at the end. It is possible that another command would only allow 1 file to be passed in.

Exactly right ! Programs don't limit the number of acceptable command-line arguments (for instance , in C that's array of strings const char *args[] and in python sys.argv[] ), but they can detect the length of that array or whether or not something unexpected is in wrong array position. grep doesn't do that, and accepts multiple files, which is by design.


On side note , improper quoting coupled with globbing with grep can sometimes be a problem. Consider this:

bash-4.3$ echo "one two" | strace -e trace=execve grep *est*
execve("/bin/grep", ["grep", "self_test.sh", "test.wxg"], [/* 80 vars */]) = 0
+++ exited with 1 +++

Unprepared user would expect that grep will match any line with est letters in it coming from pipe, but instead shell's filename expansion twisted everyting around. I've seen this happen a lot with people who do ps aux | grep shell_script_name.sh, and they expect to find their process running, but because they ran command from same directory where script was, shell's filename expansion made grep command to look completely different behind the scenes from what user expected.

Proper way would be to use single quotes:

bash-4.3$ echo "one two" | strace -e trace=execve grep '*est*'
execve("/bin/grep", ["grep", "*est*"], [/* 80 vars */]) = 0
+++ exited with 1 +++