Ubuntu – Remove line from a file which is named in another

awkcommand linegreptext processing

I have the following list:

NM_000014 3
NM_000015 0
NM_000016 0
NM_000017 0
NM_000018 0
NM_000019 28
NM_000020 0
NM_000021 0
NM_000022 0
NM_000023 0
NM_000024 8
NM_000025 0
NM_000026 0

And I have another file with just the first column:

NM_000031
NM_000032
NM_000033
NM_000034
NM_000022
NM_000035
NM_000036
NM_000037
NM_000023
NM_000038
NM_000039
NM_000040
NM_000041
NM_000042

I want to remove the whole lines of the first file that are named as the second. In that case the output file will be:

NM_000014 3
NM_000015 0
NM_000016 0
NM_000017 0
NM_000018 0
NM_000019 28
NM_000020 0
NM_000021 0
NM_000024 8
NM_000025 0
NM_000026 0

(removing NM_000022 and NM_00023 with its corresponding values)

Thanks!!

Best Answer

With awk:

awk 'NR==FNR {a[$0]; next}; {if ($1 in a) next}; 1' f1.txt f2.txt

Input the file with just a single column as the first argument, and the one to check for (first column) membership as the second argument.

  • NR==FNR {a[$0]; next}, here NR==FNR will only be true for the first file, then we are saving each line into array a so that we can do membership test for the first argument of the second file, then we are going to the next line without any further operation on the second file

  • {if ($1 in a) next}; 1 will be executed only for the second file, the one to check, here we are checking if the whitespace separated first argument is in the array a, if present then going to the next line otherwise printing the whole line.

Example:

$ cat f1.txt 
NM_000031
NM_000032
NM_000033
NM_000034
NM_000022
NM_000035
NM_000036
NM_000037
NM_000023
NM_000038
NM_000039
NM_000040
NM_000041
NM_000042

$ cat f2.txt 
NM_000014 3
NM_000015 0
NM_000016 0
NM_000017 0
NM_000018 0
NM_000019 28
NM_000020 0
NM_000021 0
NM_000022 0
NM_000023 0
NM_000024 8
NM_000025 0
NM_000026 0

$ awk 'NR==FNR {a[$0]; next}; {if ($1 in a) next}; 1' f1.txt f2.txt
NM_000014 3
NM_000015 0
NM_000016 0
NM_000017 0
NM_000018 0
NM_000019 28
NM_000020 0
NM_000021 0
NM_000024 8
NM_000025 0
NM_000026 0