AWK – Joining Multiple Columns from Different Files

I have 4 tsv (tab separated) files that look like this:

file_1:

abc 1
def 2
ghi 3

file_2:

abc 2
ghi 3

file_3:

def 1
ghi 2
jkl 4

file_4:

ghi 3
jkl 4

I want to join those file to get 1 tsv file like this:

dataset file_1 file_2 file_3 file_4
abc     1      2             
def     2      4            
ghi     3      3      2      3
jkl                   4      4

I have try using awk

$ awk '
    BEGIN{OFS=FS="\t"} 
    FNR==1{f = f "\t" FILENAME} 
    NR==FNR{a[$1] = $2} 
    NR!=FNR{a[$1] = a[$1] "\t" $2} 
    END{printf "dataset%s\n", f; for(i in a) print i, a[i]}
  ' file_{1..4}

This command is work, but I got shifted value. Let say, if first and second column have empty value and third and fourth column have value 4 and 4, the output that I got from that command is for first and second column have value 4, but for third and fourth column have empty value.
So I try to join my tsv file separately using awk that I mentioned. First only for file_1 and file_2 to get output_1, then join file_3 and file_4 to get output_2. After that I use $ join output_1 output_2 to merge output_1 and output_2 but I only get value that exist in 4 file. I lost data that only exist in one file.

I'll very appreciate if you can give me an advice.

Thank you

$ cat tst.awk BEGIN { FS=OFS="\t" } { datasets[$1]; fnames[FILENAME]; vals[$1,FILENAME] = $2 } END { printf "%s", "dataset" for (fname in fnames) { printf "%s%s", OFS, fname } print "" for (dataset in datasets) { printf "%s", dataset for (fname in fnames) { printf "%s%s", OFS, vals[dataset,fname] } print "" } } $ tail -n +1 file? ==> file1 <== a 1 b 2 c 3 ==> file2 <== a 2 c 3 $ awk -f tst.awk file1 file2 dataset file1 file2 a 1 2 b 2 c 3 3

AWK – Joining Multiple Columns from Different Files

Best Answer

Related Question

Best Answer

Related Solutions

Unix: replace one entire column in one file with a single value from another file

Joining entries based off of column using awk/join

Related Question