Join two files based on a column

awkjoin;merge

file1:

a, 1    
b, 5    
c, 2    
f, 7

file2:

a, 2    
f, 9    
g, 3

I want to join file 1 and file 2 based on column 1 and get file 3 as below.

file3:

a, 1, 2    
b, 5, -    
c, 2, -    
f, 7, 9    
g, -, 3

merge the matching values and also keep the specific ones from each file

Best Answer

Using join:

$ join -t, -a 1 -a 2 -o0,1.2,2.2 -e ' -' file1 file2
a, 1, 2
b, 5, -
c, 2, -
f, 7, 9
g, -, 3

The standard join utility will perform a relational JOIN operation on the two sorted input files.

The flags used here tells the utility to expect comma-delimited input (-t,) and to produce output for all entries in both files (-a 1 -a 2, otherwise it would only produce output for lines with matching first field). We then ask for the join field along with the second column of both files to be outputted (-o0,1.2,2.2) and say that any missing field should be replaced by the string ␣- (space-dash, with -e ' -').

If the input is not sorted, it has to be pre-sorted. In shells that understands process substitution with <( ... ), this my be done through

join -t, -a 1 -a 2 -o0,1.2,2.2 -e ' -' <( sort file1 ) <( sort file2 )
Related Question