Delete multi-line strings

awkperlsedsqltext processing

There have been several questions here regarding replacing multi-line strings using the Unix shell, but I haven't found one that will work in this situation.

I am trying to remove keys and constraints from some MySQL DDL, which looks like this (one example):

CREATE TABLE `access_group` (
  `GROUP_ID` int(10) NOT NULL AUTO_INCREMENT,
  `PARENT_GROUP_ID` int(10) DEFAULT NULL,
  `GROUP_NAME` varchar(45) NOT NULL,
  `GROUP_DESC` varchar(45) NOT NULL DEFAULT '',
  PRIMARY KEY (`GROUP_ID`),
  KEY `testkey` (`PARENT_GROUP_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=66 DEFAULT CHARSET=latin1;

I want to remove everything from the comma ending the line before 'PRIMARY KEY' up to, but not including ') ENGINE=' (there can be zero or multiple lines between these, and they won't always start with KEY or have the parenthesis, but the ') ENGINE=' is consistent). The result should look like this:

CREATE TABLE `access_group` (
  `GROUP_ID` int(10) NOT NULL AUTO_INCREMENT,
  `PARENT_GROUP_ID` int(10) DEFAULT NULL,
  `GROUP_NAME` varchar(45) NOT NULL,
  `GROUP_DESC` varchar(45) NOT NULL DEFAULT ''
) ENGINE=InnoDB AUTO_INCREMENT=66 DEFAULT CHARSET=latin1;

I'm open to using any standard command-line utility (e.g. sed, perl, awk), but since these files can be fairly large (some are on the order of tens or hundreds of GB) they need to be efficient. Since the files are usually stored gzipped (or sometimes I process the output of the mysql dump utility directly rather than first writing to disk) I need something that can be piped into and out of.

Best Answer

Keep state on whether to print the previous line, edit said to remove the comma when necessary. This method only keeps one or two lines of the file in memory.

#!/usr/bin/env perl
use strict;
use warnings;

my $printing = 1;
my $previous;

# reads from standard input (optionally with the conventional -) or from
# the named files
shift @ARGV if @ARGV == 1 and $ARGV[0] eq '-';
while ( my $line = readline ) {
    if ( $line =~ m/^\s+PRIMARY KEY/ ) {
        $previous =~ s/,[ \t]*$//;
        $printing = 0;
    } elsif ( $line =~ m/^\) ENGINE/ ) {
        $printing = 1;
    } elsif ( !$printing ) {
        undef $previous;
    }
    print $previous if defined $previous;
    $previous = $line if $printing;
}
# don't forget last line after fall off the end of input (eof)
print $previous if defined $previous;
Related Question