I want to merge two files based on the common data present in them as header.
Following is the example
File1
>Feature scaffold1
1 100 g
101 200 g
201 300 g
>Feature scaffold2
1 100 g
01 500 g
>Feature scaffold3
10 500 g
>Feature scaffold4
10 300 g
File 2
>Feature scaffold1
500 500 r
900 1000 r
>Feature scaffold2
200 300 r
>Feature scaffold3
100 200 r
>Feature scaffold4
500 600 r
>Feature scaffold5
1 1000 r
And here's the kind of output I want:
>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r
I have tried some awk and sed but clearly have not been successful, how can I do this?
Best Answer
Awk
solution:/^>/{ k=$1 FS $2 }
- on encountering header line(i.e.>Feature ...
) - compose a keyk
from the 1st$1
and 2nd$2
fieldsNR==FNR{ ... }
- processing the 1st input file (file1
):if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0
- accumulate non-header lines into arraya
using current keyk
next
- jump to next recordk in a
- if current key based onfile2
record is in arraya
(based onfile1
records):print $0 ORS a[k]
- print related recordsdelete a[k]
- delete processed item(s)The output: