Linux – Multi-Line Sed Replace Guide

find and replacelinuxsed

Consider the following text (incidentally, part of a MySQL dump):

CREATE TABLE `table` (
  `id` int(10) NOT NULL auto_increment,
  `name` varchar(100) NOT NULL default '',
  `description` text NOT NULL,
  PRIMARY KEY  (`id`),
  FULLTEXT KEY `full_index` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

I would like to remove the FULLTEXT key, and I also want to remove the trailing comma on the line above so that the SQL remains valid.

Can anyone come up with (and explain) a sed recipe to do this?

Best Answer

AWK answer

With your sample text in a file named sql, the following pattern (with line breaks and indentation for clarity):

awk -v skip=1 '{
    if (skip) { skip=0 }
    else {
        if (/FULLTEXT KEY/) { skip=1; sub(/,$/, "", prevline) }
        print prevline
    }
    prevline=$0
}
END { print prevline }' sql

produces:

CREATE TABLE `table` (
  `id` int(10) NOT NULL auto_increment,
  `name` varchar(100) NOT NULL default '',
  `description` text NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

Explanation:

We implement "lookahead" by only printing the previously encountered line at every iteration, after inspecting the current line.
If the current line contains the FULLTEXT KEY marker, we set a flag to skip printing this line during the next iteration. We also remove the trailing comma on the previous line that is about to be printed.
We skip printing an empty initial line (before prevline has been set) by initially setting skip to 1 ("true").
We make sure to print the last line by ending the script with an extra prevline print. Note that the current implementation assumes that this last line is not a line at risk of being skipped, i.e. that it does not contain the FULLTEXT KEY marker.

Original (incomplete) `sed` answer

This answer is incomplete and certainly in most cases incorrect, since sed will consume the input stream too quickly for the intended result when doing multiline matching -- as pointed out in the comments, it will only work for matches on even numbered rows! sed does not have "true" lookahead functionality, so we would be better off using Python/Perl/etc., or indeed AWK as above.

With your sample text in a file named sql, the following pattern:

$ sed 'N; s/,\n  FULLTEXT.*//' sql

produces:

CREATE TABLE `table` (
  `id` int(10) NOT NULL auto_increment,
  `name` varchar(100) NOT NULL default '',
  `description` text NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

Explanation:

N enables multiline matching.
\n represents a line break.
s/pattern/replacement/ is the standard replacement syntax.
.* will match anything to the end of the current line.

Related Solutions

Using sed to replace two patterns within a larger pattern

This command converts your input example to your output example:

sed 's|\\|/|g;s|^[^/]*|/enlistments|'

If some of your input contains backslashes in the portion after the /^.*([[:digit:]]+):/ part, then you will have to divide and conquer to prevent those latter backslashes from being replaced.

sed 'h;s/^.*([[:digit:]]\+)://;x;s/^\(.*([[:digit:]]\+):\).*/\1/;s|\\|/|g;s|^[^/]*|/enlistments|;G;s/\n//'

Explanation (steps marked with an asterisk ([*]) apply to both commands):

h - copy the line to hold space
s/^.*([[:digit:]]\+):// - delete the first part of the line from the original in pattern space
x - swap pattern space and hold space
s/^\(.*([[:digit:]]\+):\).*/\1/ - keep the first part of the line from the copy (discard the last part)
s|\\|/|g - [*] change all the backslashes to slashes (in the divide-and-conquer version, only the portion in pattern space - the first part of the line - is affected)
s|^[^/]*|/enlistments| - [*] change whatever appears before the first slash into "/enlistments" - this could be made more selective if needed
G - append a newline and the contents of hold space onto the end of pattern space
s/\n// - remove the interior newline

Linux – Replace a block of numbers in sed

You don't need the +. Just use the following:

echo "fdsafdsa 32432 dsafdas" | sed 's/[0-9]/#/g'

[0-9] will already match all digits, and replace every single one with #.

Since + is extended syntax, you could also do:

echo "fdsafdsa 32432 dsafdas" | sed -E 's/[0-9]+/#/g'

to replace the whole block of digits with one #.