Anyone know of a non-line-based tool to "binary" search/replace strings in a somewhat memory-efficient way? See this question too.
I have a +2GB text file that I would like to process similar to what this appears to do:
sed -e 's/>\n/>/g'
That means, I want to remove all newlines that occur after a >
, but not anywhere else, so that rules out tr -d
.
This command (that I got from the answer of a similar question) fails with couldn't re-allocate memory
:
sed --unbuffered ':a;N;$!ba;s/>\n/>/g'
So, are there any other methods without resorting to C?
I hate perl, but am willing to make an exception in this case 🙂
I don't know for sure of any character that does not occur in the data, so temporary replacing \n
with another character is something I'd like to avoid if possible.
Any good ideas, anyone?
Best Answer
This really is trivial in Perl, you shouldn't hate it!
Explanation
-i
: edit the file in place, and create a backup of the original calledfile.bak
. If you don't want a backup, just useperl -i -pe
instead.-pe
: read the input file line by line and print each line after applying the script given as-e
.s/>\n/>/
: the substitution, just likesed
.And here's an
awk
approach: