Shell – Compare two .csv files

csv-simpleshell-scripttext processing

I have two files: 'file1' has employee ID numbers. 'file2' has the complete database of the employees. Here is what they look like:

file1

123123
222333

file2

111222 Jones Sally
111333 Johnson Roger
123123 Doe John
444555 Richardson George
222333 Smith Jane
223456 Alexander Philip

I want to compare the two files and eliminate the entries from 'file2' that have ID numbers in 'file1'.

I found this 'awk' command which works perfectly:

awk 'FNR==NR{a[$1];next};!($1 in a)' file1 file2 > file3

The result:

file3

111222 Jones Sally
111333 Johnson Roger
444555 Richardson George
223456 Alexander Philip

WONDERFUL!

My problem is that the files are actually simplified .csv files, and I must use a comma as a separator rather than a space. I have tried everything I can think of to make this work (i.e -F, -F',' -F"," everywhere in the command) and no success.

How do I get this to work with .csv files?

By the way, I am on MacBook Pro, OSX Lion!

Best Answer

I'm assuming your csv files are something like:

File1

123123,,
222333,,

File2

111222,Jones,Sally
111333,Johnson,Roger
123123,Doe,John
444555,Richardson,George
222333,Smith,Jane
223456,Alexander,Philip

You could try using the join command, like so:

# join -t, -v 2 <(sort file1) <(sort file2)
111222,Jones,Sally
111333,Johnson,Roger
223456,Alexander,Philip
444555,Richardson,George

More information about the command can be found here: man join

join [OPTION]... FILE1 FILE2

-t CHAR
    use CHAR as input and output field separator 
-v FILENUM
    like -a FILENUM, but suppress joined output lines 
Related Question