Text Processing – How to Replace Text in the Last Non-Blank Line

text processing

I want to find the last line of text in a file, and delete the comma at the end of it. I asked about this already, but, after I got an answer I realized my question was not specific enough.

This sed command will go to the last line of a file and take action on it. In my case, I want to remove the trailing comma:

sed -i '$ s/",/"/g' file.txt

So this:

blah blah blah,
blah blah blah,
blah blah blah,

… becomes this:

blah blah blah,
blah blah blah,
blah blah blah

However, this won't work if there are blank lines after the last line of text in the file.

I've been searching for ways to get the last line of text but haven't come up with anything that I can understand and apply. I've also looked for ways to remove all trailing blank lines, and found this command:

sed -e :a -e '/^\n*$/{$d;N;ba' -e '}' *.txt

But it doesn't work for me (it just seems to output the contents of my files on the command line). In any case, it's inelegant. I'd rather not delete the trailing blank lines, it would be much better to just identify the last line with text in it and act on that.

How do I remove the comma from the last line of text in multiple files in a directory?

Best Answer

Answer

perl -0777 -p -i -e 's/,(\n*)\Z/\1/m' *.txt

will remove the last ',' in all files ending in .txt, if the ',' is followed only by 0-or-more newline characters then the end of the file.

From your example:

reedm@www:~/tmp $ cat > test.txt
blah blah blah,
blah blah blah,
blah blah blah,


reedm@www:~/tmp $ perl -0777 -p -i -e 's/,(\n*)\Z/\1/m' *.txt
reedm@www:~/tmp $ cat test.txt
blah blah blah,
blah blah blah,
blah blah blah


reedm@www:~/tmp $ 

Wat?

Perl is an esoteric beast at the best of times, and perl one-liners can be particularly cryptic.

The -e flag allows us to pass a perl program on the command line. In this case, the 's/regex/replace/flags' is the program.

The -p flag causes perl to apply your supplied program in a loop over each "line" (see -0) for each filename provided.

The -i flag causes perl to replace the file with the output of the program, rather than printing the output to standard out.

The -0 flag changes what delimiter perl uses to break a file into "lines". 0777 is a special value, used by convention to make perl read the entire file into a single "line".

The regular expression is somewhat complicated by the use of a few perl-specific tricks:

  • First, the m flag at the end causes the regex to operate on multiple lines.
  • , is simple, and matches a single, literal comma.
  • (\n*) matches 0-or-more newlines in a row, and stores them as a subpattern (the ( and ) characters denote a subpattern). As this is the first subpattern, we can use \1 in the replacement section to mean "whatever this subpattern matched".
  • \Z is a perl specific extension, and matches the end of the string being worked with -- in this case, that's the entire file.
  • In the replacement part, we use \1 to replace the match with only the series of newlines, removing the comma.

For man information on perl regular expressions and perl command line flags, check out the man pages for perlre and perlrun respectively.

Related Question