Shell – Join two files, matching on a column, with repetitions

join;shell-scripttext processing

How can I get two files A and B, and out put a result like this:

File A:

001 Apple, CA
020 Banana, CN
023 Apple, LA
045 Orange, TT
101 Orange, OS
200 Kiwi, AA

File B:

01-Dec-2013 01.664  001     AAA CAC 1083
01-Dec-2013 01.664  020     AAA CAC 0513
01-Dec-2013 01.668  023     AAA CAC 1091
01-Dec-2013 01.668  101     AAA CAC 0183
01-Dec-2013 01.674  200     AAA CAC 0918
01-Dec-2013 01.674  045     AAA CAC 0918
01-Dec-2013 01.664  001     AAA CAC 2573
01-Dec-2013 01.668  101     AAA CAC 1091
01-Dec-2013 01.668  020     AAA CAC 6571
01-Dec-2013 01.668  023     AAA CAC 2148
01-Dec-2013 01.674  200     AAA CAC 0918
01-Dec-2013 01.668  045     AAA CAC 5135

Result:

01-Dec-2013 01.664  001     AAA CAC 1083    Apple, CA
01-Dec-2013 01.664  020     AAA CAC 0513    Banana, CN
01-Dec-2013 01.668  023     AAA CAC 1091    Apple, LA
01-Dec-2013 01.668  101     AAA CAC 0183    Orange, OS
01-Dec-2013 01.674  200     AAA CAC 0918    Kiwi, AA
01-Dec-2013 01.674  045     AAA CAC 0918    Orange, TT
01-Dec-2013 01.664  001     AAA CAC 2573    Apple, CA
01-Dec-2013 01.668  101     AAA CAC 1091    Orange, OS
01-Dec-2013 01.668  020     AAA CAC 6571    Banana, CN
01-Dec-2013 01.668  023     AAA CAC 2148    Apple, LA
01-Dec-2013 01.674  200     AAA CAC 0918    Kiwi, AA
01-Dec-2013 01.668  045     AAA CAC 5135    Orange, TT

(file A: the number should match to middle number from file B)

Is there any possible way to doing this?

Best Answer

A simple solution with awk:

awk -v FILE_A="file-A" -v OFS="\t" 'BEGIN { while ( ( getline < FILE_A ) > 0 ) { VAL = $0 ; sub( /^[^ ]+ /, "", VAL ) ; DICT[ $1 ] = VAL } } { print $0, DICT[ $3 ] }' file-B

Here is a commented version:

awk -v FILE_A="file-A" -v OFS="\t" '
BEGIN {

  # Loop on the content of file-A
  # to put the values in a table

  while ( ( getline < FILE_A ) > 0 ){

     # Remove the index from the value
     VAL = $0
     sub( /^[^ ]+ /, "", VAL )

     # Fill the table
     DICT[ $1 ] = VAL
  }
}
{

  # Print the line followed by the
  # corresponding value
  print $0, DICT[ $3 ]

}' file-B
Related Question