Merging 2 files with based on field match

awkjoin;text processing

I want to create a file that contains columns from two input files. File1 is like:

aa 32
bb 15
cc 78

File2 is:

fa 19
bc 23
cc 50
de 28
aa 45
bb 31

The task is, read through File1, if the 1st field of a row exists among the 1st field of File2, then print that line of File2, with both columns and add the 2nd column entry of File1 containing the 1st field.

The output should be like:

aa 45 32
bb 31 15
cc 50 78

awk is preferred for the script.

Best Answer

$ awk 'FNR==NR{a[$1]=$2;next} ($1 in a) {print $1,a[$1],$2}' file2 file1
aa 45 32
bb 31 15
cc 50 78

Explanation:

awk implicitly loops through each file, one line at a time. Since we gave it file2 as the first argument, it is read first. file1 is read second.

  • FNR==NR{a[$1]=$2;next}

    NR is the number of lines that awk has read so far and FNR is the number of lines that awk has read so far from the current file. Thus, if FNR==NR, we are still reading the first named file: file2. For every line in file2, we assign a[$1]=$2.

    Here, a is an associative array and a[$1]=$2 means saving file2's second column, denoted $2, as a value in array a using file2's first column, $1, as the key.

    next tells awk to skip the rest of the commands and start over with the next line.

  • ($1 in a) {print $1,a[$1],$2}

    If we get here, that means that we are reading the second file: file1. If we saw the first field of the line in file2, as determined by the contents of array a, then we print out a line with the values of field 2 from both files.

Related Question