Linux – Compare Two Tab-Delimited Files by First Column

awkbashlinux

I have two tab-delimited files(fileA.txt and fileB.txt), I have to compare the first column of fileA.txt with the first column of fileB.txt and also I want to print the values present in the second column of fileB.txt in the output file.
Below is my

fileA.txt

id
chr1_45796849_A_T
chr1_45796854_C_T
chr1_45797174_T_A
chr1_45796852_G_C
chr19_9018540_A_G
chr19_9002576_T_C
chr1_45797487_A_G
chr1_45797153_A_T
chr1_45797750_C_T

FileB.txt

chr_pos             freq.var
chr1_45796849_A_T   0.028399811
chr1_45796852_G_C   0.019154034
chr1_45796854_C_T   0.015872901
chr1_45797153_A_T   0.010129176
chr1_45797487_A_G   0.012981216
chr1_45797750_C_T   0.024949931

following is expected outcome

id                  freq.var
chr1_45796849_A_T   0.028399811
chr1_45796854_C_T   0.015872901
chr1_45797174_T_A   
chr1_45796852_G_C   0.019154034
chr19_9018540_A_G   
chr19_9002576_T_C   
chr1_45797487_A_G   0.012981216
chr1_45797153_A_T   0.010129176
chr1_45797750_C_T   0.024949931

I have referred to awk – comparing 2 columns of 2 files and print common lines but it gives only matching entries

Best Answer

Read fileB.txt first, make the 1st field a key and the 2nd field its value in an array, skipping the header line with FNR>1 (What are NR and FNR and what does "NR==FNR" imply?).

Then read fileA.txt, print its header for the first line and then print its 1st field followed by the corresponding element in the array, if any.

awk '
    FNR==NR && FNR>1{a[$1]=$2}
    NR!=FNR{
        if(FNR>1){print $1,a[$1]}
        else{print "id", "freq.var"}
    }
' OFS="\t" fileB.txt fileA.txt

OFS="\t" sets the output field separator to tab. Since your file is tab delimited, I assume the output file should be tab delimited too.

You can pipe that into column -t for alignment.

Related Question