Grep –exclude option doesn’t always skip named pipes

fifogrep

I have a directory that contains, among other files, 3 named pipes: FIFO, FIFO1, and FIFO11. If I try something like

grep mypattern *

in this directory, grep hangs forever on the named pipes, so I need to exclude them. Unexpectedly,

grep --exclude='FIF*' mypattern *

does not solve the problem; grep still hangs forever. However,

grep -r --exclude='FIF*' mypattern .

does solve the hanging problem (albeit with the undesired side effect of searching all the subdirectories).

I did some testing that shows that grep --exclude ='FIF*' mypattern * works as expected if FIFO etc. are regular files, not named pipes.

Questions:

Why does grep skip --excludes in both cases if they're regular files, and skips --excluded named pipes in the recursive case, but doesn't skip named pipes in the non-recursive case?
Is there another way to format the exclusion that will skip these files in all cases?
is there a better way to accomplish what I'm after? (EDIT: I just discovered the
--devices=skip flag in grep, so that's the answer to this part … but I'm still curious about the first two parts of the question)

Best Answer

It seems grep still opens files even if the regex tells it to skip them:

$ ll
total 4.0K
p-w--w---- 1 user user 0 Feb  7 16:44 pip-fifo
--w--w---- 1 user user 4 Feb  7 16:44 pip-file
lrwxrwxrwx 1 user user 4 Feb  7 16:44 pip-link -> file

(Note: none of these have read permissions.)

$ strace -e openat grep foo --exclude='pip*' pip-file pip-link pip-fifo
openat(AT_FDCWD, "pip-file", O_RDONLY|O_NOCTTY) = -1 EACCES (Permission denied)
grep: pip-file: Permission denied
openat(AT_FDCWD, "pip-link", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
grep: pip-link: No such file or directory
openat(AT_FDCWD, "pip-fifo", O_RDONLY|O_NOCTTY) = -1 EACCES (Permission denied)
grep: pip-fifo: Permission denied
+++ exited with 2 +++

Granting read permissions, it appears that it doesn't try to read them after opening if they are excluded:

$ strace -e openat grep foo --exclude='pip*' pip-file pip-link pip-fifo
openat(AT_FDCWD, "pip-file", O_RDONLY|O_NOCTTY) = 3
openat(AT_FDCWD, "pip-link", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory)
grep: pip-link: No such file or directory
openat(AT_FDCWD, "pip-fifo", O_RDONLY|O_NOCTTY^Cstrace: Process 31058 detached
 <detached ...>

$ strace -e openat,read grep foo --exclude='pip*' pip-file
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000\25\0\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\r\0\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\t\2\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260`\0\0\0\0\0\0"..., 832) = 832
openat(AT_FDCWD, "pip-file", O_RDONLY|O_NOCTTY) = 3
+++ exited with 1 +++

$ strace -e openat,read grep foo --exclude='pipe*' pip-file
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000\25\0\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\r\0\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\t\2\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260`\0\0\0\0\0\0"..., 832) = 832
openat(AT_FDCWD, "pip-file", O_RDONLY|O_NOCTTY) = 3
read(3, "foo\n", 32768)                 = 4
foo
read(3, "", 32768)                      = 0
+++ exited with 0 +++

And since openat wasn't called with O_NONBLOCK, the opening itself hangs, and grep doesn't reach the part where it excludes it from reading.

Looking at the source code, I believe the flow is like this:

If not recursive, call grep_command_line_arg on each file.
That calls grepfile if not on stdin.
grepfile calls grepdesc after opening the file.
grepdesc checks for excluding the file.

When recursive:

grepdirent checks for excluding the file before calling grepfile, so the failing openat never happens.

Related Solutions

Using named pipes to send keypresses to an interactive program

It seems that -C causes mpg123 to read from the terminal, not from stdin. I see this, however, in my version of mpg123's man page:

-R, --remote
       Activate  generic  control interface.  mpg123 will then read and
       execute commands from stdin. Basic usage is ``load <filename> ''
       to  play some file and the obvious ``pause'', ``command.  ``jump
       <frame>'' will jump/seek to a given point (MPEG  frame  number).
       Issue ``help'' to get a full list of commands and syntax.

This may be what you are looking for; try mpg123 -vR <pipe. The interaction in your example would become something like the following (this sets the volume to 30%):

$ cat >pipe
load /some/song.mp3
volume 30

But then, what does -C do that -R doesn't that results in the former mode failing to read from stdin when a named pipe, rather than a terminal, is connected?

A quick look at the mpg123 source code indicates that it uses the termios facilities to read keypresses from the terminal, using tcsetattr to put it in the so-called "non-canonical mode", where keypresses are transmitted to the reader without further processing (in particular, without waiting for a complete line to have been typed):

struct termios tio = *pattern;
(...)

tio.c_lflag &= ~(ICANON|ECHO);
(...)

return tcsetattr(0,TCSANOW,&tio);

(This is the same as the GNU libc code sample.)

Then, in a loop, a function get_key is called, which uses select to tell whether file descriptor 0 (stdin) has data available and, if so, reads one byte from it (read(0,val,1)). But this still doesn't explain why a terminal works but a pipe doesn't! The answer lies in the terminal initialization code:

/* initialze terminal */
void term_init(void)
{
    debug("term_init");

    term_enable = 0;

    if(tcgetattr(0,&old_tio) < 0)
    {
        fprintf(stderr,"Can't get terminal attributes\n");
        return;
    }
    if(term_setup(&old_tio) < 0)
    {
        fprintf(stderr,"Can't set terminal attributes\n");
        return;
    }

    term_enable = 1;
}

Note that if either tcgetattr or term_setup fails, then term_enable is set to 0. (The function to read keys from the terminal starts with if(!term_enable) return 0;.) And, indeed, when stdin isn't a terminal, tcgetattr fails, the corresponding error message is printed, and the keypress-handling code is skipped:

$ mpg123 -C ~/input.mp3 <pipe
(...)
Can't get terminal attributes

This explains why attempting to send commands by piping into mpg123 -C fails. That's a debatable choice by the implementors; presumably by simply allowing tcgetattr / tcsetattr to fail (perhaps by using a switch for that purpose), instead of disabling the keypress-handling code handling, your attempt would have worked.

Shell – How to forward between processes with named pipes

If you get rid of the killing and shutdown stuff (which is unsafe and you may, in an extreme, but not unfathomable case when child.py dies before the (head -n 1 shutdown; kill -9 $parent) & subshell does end up kill -9ing some innocent process), then child.py won't be terminating because your parent.py isn't behaving like a good UNIX citizen.

The cat std_out & subprocess will have finished by the time you send the quit message, because the writer to std_out is child_original.py, which finishes upon receiving quit at which moment it closes its stdout, which is the std_out pipe and that close will make the cat subprocess finish.

The cat > std_in isn't finishing because it's reading from a pipe originating in the parent.py process and the parent.py process didn't bother to close that pipe. If it did, cat > stdin_in and consequently the whole child.py would finish by itself and you wouldn't need the shutdown pipe or the killing part (killing a process that isn't your child on UNIX is always a potential security hole if a race condition caused due to rapid PID recycling should occur).

Processes at the right end of a pipeline generally only finish once they're done reading their stdin, but since you're not closing that (child.stdin), you're implicitly telling the child process "wait, I have more input for you" and then you go kill it because it does wait for more input from you as it should.

In short, make parent.py behave reasonably:

from __future__ import print_function
from subprocess import Popen, PIPE
import os

child = Popen('./child.py', stdin=PIPE, stdout=PIPE)

for letter in 'abcde':
    print('Parent writes to child: ', letter)
    child.stdin.write(letter+'\n')
    child.stdin.flush()
    response = child.stdout.readline()
    print('Response from the child:', response)
    assert response.rstrip() == letter.upper(), 'Wrong response'

child.stdin.write('quit\n')
child.stdin.flush()
child.stdin.close()
print('Waiting for the child to terminate...')
child.wait()
print('Done!')

And your child.py can be as simple as

#!/bin/sh
cat std_out &
cat > std_in
wait #basically to assert that cat std_out has finished at this point

(Note that I got rid of that fd dup calls because otherwise you'd need to close both child.stdin and the child_stdin duplicate).

Since parent.py operates in line-oriented fashion, gnu cat is unbuffered (as mikeserv pointed out) and child_original.py operates in a line oriented fashion, you've effectively got the whole thing line-buffered.

Note on Cat: Unbufferred might not be the luckiest term, as gnu cat does use a buffer. What it doesn't do is try to get the whole buffer full before writing things out (unlike stdio). Basically it makes read requests to the os for a specific size (its buffer size), and writes whatever it receives without waiting to get a whole line or the whole buffer. (read(2) can be lazy and give you only what it can give you at the moment rather than the whole buffer you've asked for.)

(You can inspect the source code at http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/cat.c ; safe_read (used instead of plain read) is in the gnulib submodule and it's a very simple wrapper around read(2) that abstracts away EINTR (see the man page)).

Best Answer

Related Solutions

Using named pipes to send keypresses to an interactive program

Shell – How to forward between processes with named pipes

Related Question