I have multiple .txt files and would like to merge them together base on their first column (numeric), and fill in "NULL" if there's any missing data.
File_1:
1 a
2 b
3 c
File_2:
3 c
4 d
5 e
File_3:
4 d
5 e
6 f
Expected_Output:
1 a NULL NULL
2 b NULL NULL
3 c c NULL
4 NULL d d
5 NULL e e
6 NULL NULL f
join -t $'\t' -a 1 -a 2 -1 1 -2 1 -e NULL -o 0,1.2,2.2 file_1 file_2 | join -t $'\t' -a 1 -a 2 -1 1 -2 1 -e NULL - file_3 > expected_output
This command give me correct output of column 1, 2 and 3 ;however, "NULL" is missing in column 4, any idea how to fix it?
Also is there a better way to merge multiple files instead writing a super long pipe command?
Best Answer
You are almost there. Using your command, we get:
Lines just don't have the same number of columns because we are not setting a format for the right-hand
join
in the pipeline.If we add it as
-o 0,1.2,1.3,2.2
(the join field + the second and third columns from the first join + the second column offile_3
):Finally, if we can assume the GNU implementation of
join
, we can let it do the job of inferring the right format and use-o auto
instead of-o 0,1.2,2.2
and-o 0,1.2,1.3,2.2
, provided that, for each file, all lines have at most the same number of fields as the first one. Quotinginfo join
: