Join two textfiles on 1st column keeping order and unpairable lines from 1st file

join;mergetext processing

I need to merge two files that I have but I first need it to match before merging them.
This is my first file containing lets say 1 Million lines.

abcde
fghi
jklmn
opqrs
123456
0000

Second file contains 3 million lines but still some lines has the string that's in the first file.

543123:fdfdss
dfskld:533fg
abcde:1234
fdskls:fkdfs
gfdkls:flfds
0000:5432
fdsk:saakl

Output:

abcde:1234
fghi
jklmn
opqrs
123456
0000:5432

I want the output to be file1:file2 but only if it matches with the first column of file2 with the string in file1.
I don't want to scramble the sorting format I want it to be in this order and I want to keep the file1 strings but just add the matching strings.

Best Answer

The easy way is via awk - just read the 2nd file, save each line into an array (where the index is $1) and when reading 1st file check if the line is already an index in array - if so replace with the value of that element:

awk -F: 'NR==FNR{z[$1]=$0;next}
($0 in z) {$0=z[$1]};1' file2 file1

You can do this with join too but it requires more work (basically, numbering lines in 1st file so as to be able to sort the result of join and restore the order):

join -a 1 -t: -1 2 -2 1 <(nl -s: -ba -nrz file1 | sort -t: -k2,2) \
<(sort -t: -k1,1 file2) | sort -t: -k2,2 | cut -d: -f1,3-
Related Question