How to split an output to two files with grep

grepio-redirection

I have a script mycommand.sh that I can't run twice. I want to split output to two different files one file containing the lines that match a regex and one file containing the lines that don't match a regex. What I wish to have is basically something like this:

./mycommand.sh | grep -E 'some|very*|cool[regex].here;)' --match file1.txt --not-match file2.txt

I know I can just redirect the output to a file and then to two different greps with and without -v option and redirect their output to two different files. But I was jsut wondering if it was possible to do it with one grep.

So, Is it possible to achieve what I want in a single line?

Best Answer

There are many ways to accomplish this.

Using awk

The following sends any lines matching coolregex to file1. All other lines go to file2:

./mycommand.sh | awk '/[coolregex]/{print>"file1";next} 1' >file2

How it works:

/[coolregex]/{print>"file1";next}

Any lines matching the regular expression coolregex are printed to file1. Then, we skip all remaining commands and jump to start over on the next line.
1

All other lines are sent to stdout. 1 is awk's cryptic shorthand for print-the-line.

Splitting into multiple streams is also possible:

./mycommand.sh | awk '/regex1/{print>"file1"} /regex2/{print>"file2"} /regex3/{print>"file3"}'

Using process substitution

This is not as elegant as the awk solution but, for completeness, we can also use multiple greps combined with process substitution:

./mycommand.sh | tee >(grep 'coolregex' >File1) | grep -v 'coolregex' >File2

We can also split up into multiple streams:

./mycommand.sh | tee >(grep 'coolregex' >File1) >(grep 'otherregex' >File3) >(grep 'anotherregex' >File4) | grep -v 'coolregex' >File2

Related Solutions

Compare two files and print matches – large files

If the files are sorted (the samples you posted are) then it's as simple as

join -t : File1.txt File2.txt

join pairs up lines from two files where the join field is equal. By default, the join field is the first field, the fields are output in order except that the join field is not repeated, and non-pairable lines are skipped, which is exactly what you want.

Note that if the files have Windows line endings, they appear under Unix systems to have an extra carriage return character at the end of each line. The CR is mostly visually invisible, but as far as join and other text tools are concerned, it's a character like any one else, and it means the fields of File1.txt all end with a CR whereas the ones in File2.txt don't so they don't match. You need to strip the CR, at least in File1.txt.

<File1.txt tr -d '\r' | join -t : - File2.txt

You do need to sort the files. If they aren't, then ksh/bash/zsh, you can use process substitutions. (Add tr -d '\r' | if needed.)

join -t : <(sort File1.txt) <(sort File2.txt)

In plain sh, if your Unix variant has /dev/fd (most do), you can use that instead to pipe the output of two programs through two file descriptors.

sort File2.txt | { sort File1.txt | join -t : /dev/fd/0 /dev/fd/3; } 3<&1

If you need to preserve the original order of File1.txt and it isn't sorted by the join field, then add line numbers to remember the original order, sort by the join field, join, sort by line numbers and strip the line numbers. (You can do something similar if you want to preserver the order of the other file.)

<File1.txt nl -s : |
sort -t : -k 2 |
join -t : -1 2 - <(sort File2.txt) |
sort -t : -k 2,2n |
cut -d : -f 1,3

Best Answer

Using awk

Using process substitution

Related Solutions

Compare two files and print matches – large files

Related Question