Linux Command Line – How to Delete Lines in 1st File Matching String in 2nd File

command linelinuxsed

Consider I have two text files.

First File name – "Emails.txt" with the following data:

00iiiiiiii_l@hotmail.com
00rrrrrrrr@hotmail.com
00zzzzz@gmail.com
00eeeeee@gotmail.com
00gggggg@uor.edu
00uuuuuuuu@yahoo.com
00e21_ss@cmail.com
00gggggggg@cmail.com
00zzzzzzzz48@hotmail.com
00aaaaaaa_2020@gotmail.com
jjjjjjjj@gmail.com

Second text file – "Banned.txt" with the following strings:

@gotmail.com
@cmail.com
@uor.edu

How to delete all the lines in the 1st text file "Emails.txt" if it matches the stings of any line present in the second text file "Banned.txt"?

The desired output of the new file should be:

00iiiiiiii_l@hotmail.com
00rrrrrrrr@hotmail.com
00zzzzz@gmail.com
00uuuuuuuu@yahoo.com
00zzzzzzzz48@hotmail.com
jjjjjjjj@gmail.com

Can this be done using SED or awk in Linux? Can you please suggest how to do this?

Best Answer

grep -v is enough. The flag -f allows you to do exactly what you want:

grep -vf Banned.txt Emails.txt

If you want to do something more complicated out of the list of banned addresses, e.g. impose that they match the whole of the domain, you'll need to generate a regex from your Banned file:

cat Banned.txt | tr "\n" "|" | sed -e 's,|,$\\|,g' | sed -e 's,\\|$,,'

gives the desired

@gotmail.com$\|@cmail.com$\|@uor.edu$

Then:

cat Banned.txt | tr "\n" "|" | sed -e 's,|,$\\\\|,g' | sed -e 's,\\|$,,' | xargs -i grep -v '{}' Emails.txt

(doubling the number of escapes \ as they're being evaluated when going through xargs). This will match and remove me@uor.edu but not e.g. me@uor.education.gov.

Related Question