Use join
:
$ join -t'|' file_1 file_2
14595|Age 35|Salary xx|Position ax|2013|Info 1|Info 2|Info 3|Info 4|Info 5|Address xx|Info 6|Info 7|Info 8
14649|Age 30|Salary xx|Position az|2015|Info 1|Info 2|Info 3|Info 4|Info 5|Address xxxz|Info 6|Info 7|Info 8
-t
indicates the field separator.
In order to join works, files must te sorted. You can use sort
for it.
You are almost there. Using your command, we get:
$ join -t $'\t' -a 1 -a 2 -1 1 -2 1 -e NULL -o 0,1.2,2.2 file_1 file_2 | join -t $'\t' -a 1 -a 2 -1 1 -2 1 -e NULL - file_3
1 a NULL
2 b NULL
3 c c
4 NULL d d
5 NULL e e
6 f
Lines just don't have the same number of columns because we are not setting a format for the right-hand join
in the pipeline.
If we add it as -o 0,1.2,1.3,2.2
(the join field + the second and third columns from the first join + the second column of file_3
):
$ join -t $'\t' -a 1 -a 2 -1 1 -2 1 -e NULL -o 0,1.2,2.2 file_1 file_2 | join -t $'\t' -a 1 -a 2 -1 1 -2 1 -e NULL -o 0,1.2,1.3,2.2 - file_3
1 a NULL NULL
2 b NULL NULL
3 c c NULL
4 NULL d d
5 NULL e e
6 NULL NULL f
Finally, if we can assume the GNU implementation of join
, we can let it do the job of inferring the right format and use -o auto
instead of -o 0,1.2,2.2
and -o 0,1.2,1.3,2.2
, provided that, for each file, all lines have at most the same number of fields as the first one. Quoting info join
:
-o auto
If the keyword auto
is specified, infer the output format from the first line in each file. This is the same as the default output format but also ensures the same number of fields are output for each line. Missing fields are replaced with the -e
option and extra fields are discarded.
Best Answer
in awk you could do
There is of course a terser way to do this, and I abuse "{ ... }", but I hope it's clear:
we first fill a "filenames[]" array using index name $NF (= LAST field in the line using "/" as separator, ie the basename).
And we also count the number of $NF we saw, thanks to the occurence[] array (if more than "1", we only have the latest one in filenames[$NF], and we have occurence[$NF]>1)
Then we only print those that have a occurence == 1