Is there a way to edit a matched pattern and then replace another pattern with the edited pattern?
Input:
a11.t
some text here
a06.t
some text here
Output:
a11.t 11
some text here
a06.t 06
some text here
The above example shows the first two digits (matched by first pattern) extracted and placed at the end of the line (second pattern).
In a programming language, I would load the file into a data structure, edit, replace, and write to a new file. But is there a one-line equivalent?
Trial:
sed 's/\(a[0-9][0-9].*\)/& \1/I' stack.fa | sed -e 's#a##g2' -e 's#\.\w##g2'
Trial output:
a11.t 11
some text here
a06.t 06
some text here
Obviously the trial works, but is there a more robust way? Further, is there another text processing language this could done in more easily?
Best Answer
sed
here is the perfect tool for the task. However note that you almost never need to pipe severalsed
invocations together as ased
script can be made of several commands.If you wanted to extract the first sequence of 2 decimal digits and append following a space to end of the line if found, you'd do:
If you wanted to do that only if it's found in second position on the line and following a
a
:And if you don't want to do it if that sequence of 2 digits is followed by more digits:
In terms of robustness it all boils down to answering the question: what should be matched? and what should not be?. That's why it's important to specify your requirements clearly, and also understand what the input may look like (like can there be digits in the lines where you don't want to find a match?, can there be non-ASCII characters in the input?, is the input encoded in the locale's charset? etc.).
Above, depending on the
sed
implementation, the input will be decoded into text based on the locale's charmap (see output oflocale charmap
), or interpreted as if each byte corresponded to a character and bytes 0 to 127 interpreted as per the ASCII charmap (assuming you're not on a EBCDIC based system).For
sed
implementations in the first category, it may not work properly if the file is not encoded in the right charset. For those in the second category, it could fail if there are characters in the input whose encoding contains the encoding of decimal digits.