Using sed, how can I remove duplicate letters from HEADERS within a text file?
NNAAMMEE
nice - run a program with modified scheduling priority
SSYYNNOOPPSSIISS
nice [-n adjustment] [-adjustment] [--adjustment=adjustment] [command [a$
Above is a an example. I want the output after parsing with sed
to be:
NAME
nice - run a program with modified scheduling priority
SYNOPSIS
nice [-n adjustment] [-adjustment] [--adjustment=adjustment] [command [a$
Best Answer
Method #1
You can use this
sed
command to do it:Example
Using your above sample input I created a file,
sample.txt
.Method #2
There is also this method which will remove all the duplicate characters:
Example
Method #3 (just the upper case)
The OP asked if you could modify it so that only the upper case characters would be removed, here's how using a modified method #1.
Example
Details of the above methods
All the examples make use of a technique where when a character is first encountered that's in the set of characters A-Z or a-z that it's value is saved. Wrapping parens around characters tells
sed
to save them for later. That value is then stored in a temporary variable that you can access either immediately or later on. These variables are named \1 and \2.So the trick we're using is we match the first letter.
Then we turn around and use the value that we just saved as a secondary character that must occur right after the first one above, hence:
In
sed
we're also making use of the search and replace facility,s/../../g
. Theg
means we're doing it globally.So when we encounter a character, followed by another one, we substitute it out, and replace it with just one of the same character.