Command line tool for easy multiline regex search and replace

command lineregular expressionsoftware-rec

I use PCRE regular expressions for search and replace very often when working with a text editor and I was left quite unhappy after I found out that in powerful Unix command line tools like perl, awk or sed it's fairly complicated to use a bit advanced multiline regex and requires various hard to remember syntax for various situations.

Is there a command line tool for Linux in which search and replace (for all occurences in the whole file) using a more complex multiline regex is as simple as:

magicregextool 's/.* > (.*) joined the channel\.\n(((?!.* \1 (was kicked from channel\.|was banned from channel\.)\n).*\n)+?.*\1 disconnected)/\2/' file.txt

i.e. the regex to match is the same as I would place in the search for field in a text editor, the replacement string can handle multiline regex as well and there's no need for any convoluted syntax?

EDIT:

Per request I'm attaching an input which I'd use the example regex above for and explaining what I want it to actually do.

An input like this:

2016-05-16 06:17:00 > foobar joined the channel.
2016-05-16 06:17:13 <foobar> hi
2016-05-16 06:18:30 > foobar was kicked from channel.
2016-05-16 06:18:30 > foobar disconnected
2016-05-16 06:20:13 > user joined the channel.
2016-05-16 06:20:38 <user> bye
2016-05-16 06:21:57 > user disconnected

should produce this output:

2016-05-16 06:17:00 > foobar joined the channel.
2016-05-16 06:17:13 <foobar> hi
2016-05-16 06:18:30 > foobar was kicked from channel.
2016-05-16 06:18:30 > foobar disconnected
2016-05-16 06:20:38 <user> bye
2016-05-16 06:21:57 > user disconnected

The regex matches any line that contains [username] joined the channel and looks for a line below it that contains [username] disconnected unless there is a [username] was kicked from channel. or [username] was banned from channel. between those 2 lines.

The replacement string then replaces the matched pattern with every line following the line with [username] joined the channel effectively deleting the line 2016-05-16 06:20:13 > user joined the channel. from the input above.

Most likely doesn't make any sense to you but this is just an example regex similar to one I've dealt with recently. Please keep in mind I'm NOT looking for a solution for this particular problem or similar problems with the Unix tools I listed above. I'm looking for a command line tool which can use unmodified "search for" and replacement strings that I use in a text editor (Geany, in particular but that shouldn't really matter) without complicated syntax or requiring some added programming logic to deal with the multiline "search for" and replacement strings.

Best Answer

I'm not sure why Perl isn't acceptable here. On the inputs you provided, this line gives the output you asked for:

perl -0777p -e 's/.* > (.*) joined the channel\.\n(((?!.* \1 (was kicked from channel\.|was banned from channel\.)\n).*\n)+?.*\1 disconnected)/\2/mg' irc.txt

The -e argument is exactly the first argument to your magicregextool except that I added the /mg regex modifier. This may not be "unmodified" but it doesn't seem unreasonable either. If you don't want to type in the whole line, how about this script as magicregextool:

#!/usr/bin/perl -0777p
BEGIN { $::arg = shift @ARGV; }
eval $arg;

Or even:

#!/bin/sh
perl -0777pe $*

Then you just type:

magicregextool 's/.* > (.*) joined the channel\.\n(((?!.* \1 (was kicked from channel\.|was banned from channel\.)\n).*\n)+?.*\1 disconnected)/\2/mg' irc.txt

Which is the same as your sample (again other than adding the /mg modifier).

An additional benefit to this is that if you are running multiple related search/replace operations on each file, you can put them together in the same script:

#!/usr/bin/perl -0777p
s/.* > (.*) joined the channel\.\n(((?!.* \1 (was kicked from channel\.|was banned from channel\.)\n).*\n)+?.*\1 disconnected)/\2/mg;
s/(some other\n)matched text/\1/mg;
Related Question