Why does ‘grep -q’ consume the whole input file

grep

Consider the following input file:

1
2
3
4

Running

{ grep -q 2; cat; } < infile

doesn't print anything. I'd expect it to print

3
4

I can get the expected output if I change it to

{ sed -n 2q; cat; } < infile

Why doesn't the first command print the expected output ?
It's a seekable input file and per the standard under OPTIONS:

-q
      Quiet. Nothing shall be written to the standard output, regardless of 
      matching lines. Exit with zero status if an input line is selected.

and further down, under APPLICATION USAGE (emphasize mine):

The -q option provides a means of easily determining whether or not a
pattern (or string) exists in a group of files. When searching several
files, it provides a performance improvement (because it can quit
as soon as it finds the first match
)[…]

Now, per the same standard (in Introduction, under INPUT FILES)

When a standard utility reads a seekable input file and terminates
without an error before it reaches end-of-file, the utility shall
ensure that the file offset in the open file description is properly
positioned just past the last byte processed by the utility
[…]

tail -n +2 file
(sed -n 1q; cat) < file
...

The second command is equivalent to the first only when the file is
seekable.


Why does grep -q consume the whole file ?


This is gnu grep if it matters (though Kusalananda just confirmed the same happens on OpenBSD)

Best Answer

grep does stop early, but it buffers its input so your test is too short (and yes, I realise my test is imperfect since it's not seekable):

seq 1 10000 | (grep -q 2; cat)

starts at 6776 on my system. That matches the 32KiB buffer used by default in GNU grep:

seq 1 6775 | wc

outputs

   6775    6775   32768

Note that POSIX only mentions performance improvements

When searching several files

That doesn't set any expectations up for performance improvements due to partially reading a single file.

Related Question