Method #1
You can use this sed
command to do it:
$ sed 's/\([A-Za-z]\)\1\+/\1/g' file.txt
Example
Using your above sample input I created a file, sample.txt
.
$ sed 's/\([A-Za-z]\)\1\+/\1/g' sample.txt
NAME
nice - run a program with modified scheduling priority
SYNOPSIS
nice [-n adjustment] [-adjustment] [--adjustment=adjustment] [command [a$
Method #2
There is also this method which will remove all the duplicate characters:
$ sed 's/\(.\)\1/\1/g' file.txt
Example
$ sed 's/\(.\)\1/\1/g' sample.txt
NAME
nice - run a program with modified scheduling priority
SYNOPSIS
nice [-n adjustment] [-adjustment] [-adjustment=adjustment] [command [a$
Method #3 (just the upper case)
The OP asked if you could modify it so that only the upper case characters would be removed, here's how using a modified method #1.
Example
$ sed 's/\([A-Z]\)\1\+/\1/g' sample.txt
NAME
nice - run a program with modified scheduling priority
SYNOPSIS
nice [-n adjustment] [-adjustment] [--adjustment=adjustment] [command [a$
Details of the above methods
All the examples make use of a technique where when a character is first encountered that's in the set of characters A-Z or a-z that it's value is saved. Wrapping parens around characters tells sed
to save them for later. That value is then stored in a temporary variable that you can access either immediately or later on. These variables are named \1 and \2.
So the trick we're using is we match the first letter.
\([A-Za-z]\)
Then we turn around and use the value that we just saved as a secondary character that must occur right after the first one above, hence:
\([A-Za-z]\)\1.
In sed
we're also making use of the search and replace facility, s/../../g
. The g
means we're doing it globally.
So when we encounter a character, followed by another one, we substitute it out, and replace it with just one of the same character.
You can do this with Awk by setting the "Record Separator" variable to be a regex matching at least two consecutive newline characters:
awk -v RS='\n\n+' '/1.*2.*3/' file.txt
You can also set the "Field Separator" to be a single newline character:
awk -v RS='\n\n+' -F '\n' '$1 == "LINE OF TEXT 1" && $2 == "LINE OF TEXT 2" && $3 == "LINE OF TEXT 3"' file.txt
Broken up for readability:
awk -v RS='\n\n+' -F '\n' '
$1 == "LINE OF TEXT 1" &&
$2 == "LINE OF TEXT 2" &&
$3 == "LINE OF TEXT 3"
' file.txt
With your requirement of only printing the filename if a match is found, you can do this like so:
awk -v RS='\n\n+' -F '\n' '
$1 == "LINE OF TEXT 1" &&
$2 == "LINE OF TEXT 2" &&
$3 == "LINE OF TEXT 3" {
match++
}
END {
if (match) {
print FILENAME
}
' file.txt
But considering you are talking about using find
in combination with awk
, I'd recommend just using Awk for the exit status and using find
for the printing:
find . -type f -exec awk -v RS='\n\n+' -F '\n' '
$1 ~ /LINE OF TEXT 1/ &&
$2 ~ /LINE OF TEXT 2/ &&
$3 ~ /LINE OF TEXT 3/ {
exit 0
}
END { exit 1 }
' {} \; -print
That way, if you want to do something else before printing (some other find
primary), you're already set up to do so.
Best Answer
Sed
Grep
Awk
As pointed out in the comments,
-o
isn't POSIX; however both GNU and BSD have it, so it should work for most people.Also,
\s
/\S
may not be on all systems, if yours doesn't recognize it you can use a literal space, or if you want space and tab, those in a bracket expression ([...]
), or the[[:blank:]]
character class (note that strictly speaking\s
is equivalent to[[:space:]]
and includes vertical spacing characters as well like CR, LF or VT which you probably don't care about).The
awk
one assumes the lines don't start with a blank character.