I have multiple files with the same header and different vectors below that. I need to concatenate all of them but I want only the header of first file to be concatenated and I don't want other headers to be concatenated since they are all same.
for example:
file1.txt
<header>INFO=<ID=DP,Number=1,Type=Integer>
<header>INFO=<ID=DP4,Number=4,Type=Integer>
A
B
C
file2.txt
<header>INFO=<ID=DP,Number=1,Type=Integer>
<header>INFO=<ID=DP4,Number=4,Type=Integer>
D
E
F
I need the output to be
<header>INFO=<ID=DP,Number=1,Type=Integer>
<header>INFO=<ID=DP4,Number=4,Type=Integer>
A
B
C
D
E
F
I could write a script in R but I need it in shell?
Best Answer
If you know how to do it in R, then by all means do it in R. With classical unix tools, this is most naturally done in awk.
The first line of the awk script matches the first line of a file (
FNR==1
) except if it's also the first line across all files (NR==1
). When these conditions are met, the expressionwhile (/^<header>/) getline;
is executed, which causes awk to keep reading another line (skipping the current one) as long as the current one matches the regexp^<header>
. The second line of the awk script prints everything except for the lines that were previously skipped.