File file1.txt contains lines like:
/api/purchase/<hash>/index.html
For example:
/api/purchase/12ab09f46/index.html
File file2.csv contains lines like:
<hash>,timestamp,ip_address
For example:
12ab09f46,20150812235200,22.231.113.64
a77b3ff22,20150812235959,194.66.82.11
I want to filter file2.csv removing all lines where the value of hash is present also in file1.txt. That's to say:
cat file1.txt | extract <hash> | sed '/<hash>/d' file2.csv
or something like this.
It should be straightforward, but I seem unable to make it work.
Can anyone please provide a working pipeline for this task?
Best Answer
cut -d / -f 4 file1.txt | paste -sd '|' | xargs -I{} grep -v -E {} file2.csv
Explanation:
cut -d / -f 4 file1.txt
will select the hashes from the first filepaste -sd '|'
will join all the hashes into a regular expression ex.H1|H2|H3
xargs -I{} grep -v -E {} file2.csv
will invoke grep with the previous pattern as an argument, xargs will replace{}
with the content of theSTDIN
If you don't have
paste
you could replace it withtr "\\n" "|" | sed 's/|$//'