The simplest way to remove lines from one file matched with lines from another file

bashbash-scriptingshell

What's the simplest way to remove lines from one file matched with lines from another file? For example, if I have the following files:

file1.csv:

u2@domain.com

file2.csv:

1,u1@domain.com,somehash1
2,u2@domain.com,somehash2
3,u3@domain.com,somehash3

As a result I'd like to have file3.csv:

1,u1@domain.com,somehash1
3,u3@domain.com,somehash3

What's the fastest way to solve this task? These files are a few GB in size.

Best Answer

grep -v -F -f file1.csv file2.csv > file3.csv seems the simplest. But you should do performance tests with smaller files first. (I agree with soandos' comment that such big files might need a dedicated solution.)

Related Solutions

Diff Command – Ignore Lines Missing in One File

You might also take a look at comm, if you have it available:

comm [-1] [-2] [-3 ] file1 file2
-1 Suppress the output column of lines unique to file1.
-2 Suppress the output column of lines unique to file2.
-3 Suppress the output column of lines duplicated in file1 and file2.

The input files should be sorted. However, you can modify the default behavior with --nocheck-order option, if available.

In your case you would want comm --nocheck-order -23 file filter_file

Bash Script – Replace Text Between Markers with Another File

lead='^### BEGIN GENERATED CONTENT$'
tail='^### END GENERATED CONTENT$'
sed -e "/$lead/,/$tail/{ /$lead/{p; r insert_file
        }; /$tail/p; d }"  existing_file

Best Answer

Related Solutions

Diff Command – Ignore Lines Missing in One File

Bash Script – Replace Text Between Markers with Another File

Related Question