I have text-files containing many lines, of which some starts with ">" (it's a so-called *.fasta file, and the ">"s marks the beginning of a new information container):
>header_name1
sequence_info
>header_name2
sequence_info
I want to add the name of the file these lines are located in to the header. For example, if the file is named "1_nc.fasta", all the lines inside the file starting with > should have the label "001" added:
>001-header_name1
sequence_info
>001-header_name2
sequence_info
Someone nice provided me with this line:
sed 's/^>/>001-/g' 1_nc.fasta>001_tagged.fasta
Accordingly, all headers in 2_nc.fasta should start with "002-", 3_nc.fasta -> "003-", and so on.
I know how to write parallel job scripts, but the jobs are done so quickly, I think a script that serially processes all files in a loop is much better. Unfortunately, I can't do this on my own.
Added twist: 11_nc.fasta and 149_nc.fasta are not available.
How can I loop that through all the 500 files in my directory?
Best Answer
This should do the trick. I break the filename at the underscore to get the numerical prefix, and then use a
printf
to zero-pad it out to a three digit string.