Tac command’s option creates strange output [comprehension question]

coreutilstext processingUtilities

Say I have this file, containing nothing but

a
b
c
b
a

Using tac --separator=a file in BASH [on a Debian based Linux], I get this:

                  # empty line
                  # empty line
b
c
b
aacommand@prompt  # two a just before the prompt

Question: As far as I understood, --separator=a defines that a marks the break inside the string, instead of newline. Is this right?

I have tried this with other strings an much more input and ended up with quite a mess. The other options all work quite right, I presume: If I use tac --before I first get about five than one empty lines, but then this is about what should happen, right?

Best Answer

tac is easier to understand in the case it's primarily designed for, which is when the separator is a record terminator, i.e. the separator appears after the last record. It prints the records (including each terminator) in reverse order.

$ echo -n fooabara | tac -s a; echo
rabafooa

The input consists of three records (foo, b and r), each followed by the separator a; the output consists of three records (r, b and foo), each followed by the separator a.

If the last record doesn't end with a record terminator, it's still printed first, with no record separator.

$ echo -n fooabar | tac -s a; echo
rbafooa

The last record r ends up concatenated with the next-to-last record b with no separator in between, since there was no separator at the end of the last record.

Your input looks a bit more confusing because of the newlines. Let's see it with commas instead of newlines:

$ echo -n a,b,c,b,a, | tac -s a; echo
,,b,c,b,aa

There are three input records: an empty one (with a terminator a), the bulky one ,,b,c,b, (again with a terminator), and an unterminated , at the end. These records (each with their terminator, except for the last record which doesn't have a terminator) are printed in reverse order.

Your confusion probably comes from expecting the “separator” to be a separator — but that's a misnomer: it's really a record terminator. --before makes it an initiator instead.

Related Solutions

Bash – Remove nearly duplicate lines

How about joining adjacent pairs of lines, and then using a backreference to find the non-unique prefix?

$ sed '$!N; /\(.*\)\n\1:FOO/D; P;D' file
red.7
green.2:FOO
blue.6
yellow.9:FOO

Explanation:

$!N - if we are not already at the last line, append the next line to the pattern space, separated by a newline
/$.*$\n - match everything up to the newline (i.e. the first of each pair of lines) and save it into a capture group
\1:FOO now matches whatever was captured from the first line, followed by :FOO (\1 is a backreference to the first capture group)
/$.*$\n\1:FOO/D - if the second line of each pair is the same as the first followed by :FOO, then Delete the first
Print and Delete the remaining line ready to start the next cycle

or neater (thanks @don_crissti)

 sed '$!N; /$.*$\n\1:FOO/!P;D' file
N means there are always two consecutive lines in the pattern space and sed Prints the first one of them only if the second line isn't the same as the first one plus the suffix :FOO. Then D removes the first line from the pattern space and restarts the cycle.

How to find all files containing various strings from a long list of string combinations

Since agrep seems not to be present in your system, have a look in this alternative based on sed and awk to apply grep with and operation from patterns read by a local file.

PS: Since you use osx i'm not sure if the awk version you have will support bellow usage.

awk can simulate grep with AND operation of multiple patterns in this usage:
awk '/pattern1/ && /pattern2/ && /pattern3/'

So you could transform your pattern file from this:

$ cat ./tmp/d1.txt
"surveillance data" "surveillance technology" "cctv camera"
"social media" "surveillance techniques" "enforcement agencies"
"social control" "surveillance camera" "social security"
"surveillance data" "security guards" "social networking"
"surveillance mechanisms" "cctv surveillance" "contemporary surveillance"

To this:

$ sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' ./tmp/d1.txt
/surveillance data/ && /surveillance technology/ && /cctv camera/
/social media/ && /surveillance techniques/ && /enforcement agencies/
/social control/ && /surveillance camera/ && /social security/
/surveillance data/ && /security guards/ && /social networking/
/surveillance mechanisms/ && /cctv surveillance/ && /contemporary surveillance/

PS: You can redirect the output to another file by using >anotherfile in the end , or you can use the sed -i option to make in-place changes in the same search terms pattern file.

Then you just need to feed awk with awk-formatted patterns from this pattern file :

$ while IFS= read -r line;do awk "$line" *.txt;done<./tmp/d1.txt #d1.txt = my test pattern file

You could also not transform patterns in your original pattern file by applying sed in each line of this original pattern file like this:

while IFS= read -r line;do 
  line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line")
  awk "$line" *.txt
done <./tmp/d1.txt

Or as one-liner:

$ while IFS= read -r line;do line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line"); awk "$line" *.txt;done <./tmp/d1.txt

Above commands return the correct AND results in my test files that look like this:

$ cat d2.txt
This guys over there have the required surveillance technology to do the job.
The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.

$ cat d3.txt
All surveillance data are locked.
All surveillance data are locked and guarded by security guards.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

Results:

$ while IFS= read -r line;do awk "$line" *.txt;done<./tmp/d1.txt
#or while IFS= read -r line;do line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line"); awk "$line" *.txt;done <./tmp/d1.txt
The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

Update:
Above awk solution prints the contents of matching txt files.
If you want to display the filenames instead of the contents, then use the following awk where necessary:

awk "$line""{print FILENAME}" *.txt

Best Answer

Related Solutions

Bash – Remove nearly duplicate lines

How to find all files containing various strings from a long list of string combinations

Related Question