I have 4 tsv (tab separated) files that look like this:
file_1:
abc 1
def 2
ghi 3
file_2:
abc 2
ghi 3
file_3:
def 1
ghi 2
jkl 4
file_4:
ghi 3
jkl 4
I want to join those file to get 1 tsv file like this:
dataset file_1 file_2 file_3 file_4
abc 1 2
def 2 4
ghi 3 3 2 3
jkl 4 4
I have try using awk
$ awk '
BEGIN{OFS=FS="\t"}
FNR==1{f = f "\t" FILENAME}
NR==FNR{a[$1] = $2}
NR!=FNR{a[$1] = a[$1] "\t" $2}
END{printf "dataset%s\n", f; for(i in a) print i, a[i]}
' file_{1..4}
This command is work, but I got shifted value. Let say, if first and second column have empty value and third and fourth column have value 4 and 4, the output that I got from that command is for first and second column have value 4, but for third and fourth column have empty value.
So I try to join my tsv file separately using awk
that I mentioned. First only for file_1
and file_2
to get output_1
, then join file_3
and file_4
to get output_2
. After that I use $ join output_1 output_2
to merge output_1 and output_2 but I only get value that exist in 4 file. I lost data that only exist in one file.
I'll very appreciate if you can give me an advice.
Thank you
Best Answer
Add as many files to the list as you like.