Lum – Compare 2 tab delimited files and output differences with column header

awkcolumnsscriptingtext processing

I would like to compare 2 similar files on a common column. The files will have identical headers.

file1.txt

mem_id      Date    Time    Building    
aa          bb      cc      dd
ee          ff      gg      hh
ii          jj      kk      ll

file2.txt

mem_id      Date    Time    Building    
aa          bb      cc      dd
ee          ff      2g      hh
ii          jj      kk      2l

Command

awk 'NR==FNR{for(i=1;i<=NF;i++){A[i,NR]=$i}next} {for(i=1;i<=NF;i++){if(A[i,FNR]!=$i)\ 
{print "ID#-"$1": Column",i"- File1.txt value=",A[i,FNR]" / File2.txt value= "$i}}}'\ 
file1.txt file2.txt

Current Output

ID#-ee: Column 3- File1.txt value= gg / File2.txt value= 2g
ID#-ii: Column 4- File1.txt value= ll / File2.txt value= 2l

Desired Output

mem_id#-ee: Time- file1.txt value= gg / file2.txt value= 2g
mem_id#-ii: Building- file1.txt value= ll / file2.txt value= 2l 

I am very close. But I would like help with a few improvements.

1- I would like to replace the “Column 3” and “Column 4” with the actual column header (Time, Building, whatever)

2- I would like to dynamically gather the file names in the output and not have to add it as part of the command (to make it universal)

3- I would like this scriptable.

Any help would be appreciated.

Best Answer

Using awk:

awk '
NR==1 { 
  for (i=1; i<=NF; i++)
    header[i] = $i
}
NR==FNR {
  for (i=1; i<=NF; i++) {
    A[i,NR] = $i
  }
  next
}
{
  for (i=1; i<=NF; i++)
    if (A[i,FNR] != $i)
      print "ID#-" $1 ": " header[i] "- " ARGV[1] " value= ", A[i,FNR]" / " ARGV[2] " value= "$i
}' file1.txt file2.txt

Output:

ID#-ee: Time- file1.txt value=  gg / file2.txt value= 2g
ID#-ii: Building- file1.txt value=  ll / file2.txt value= 2l
Related Question