Using sed/awk to remove anything after first space

awkgrepsed

aaaaaaaa 09  
bbbbbbbb 90   
ccccccccccccccc  89  
ddddd 09

Using sed/awk/replace, in the above text I want to remove anything that comes after the first space in each line. For example the output will be:

aaaaaaaa  
bbbbbbbb    
ccccccccccccccc  
ddddd

any help will be appreciated.

Best Answer

Sed

sed 's/\s.*$//'

Grep

grep -o '^\S*'

Awk

awk '{print $1}'

As pointed out in the comments, -o isn't POSIX; however both GNU and BSD have it, so it should work for most people.

Also, \s/\S may not be on all systems, if yours doesn't recognize it you can use a literal space, or if you want space and tab, those in a bracket expression ([...]), or the [[:blank:]] character class (note that strictly speaking \s is equivalent to [[:space:]] and includes vertical spacing characters as well like CR, LF or VT which you probably don't care about).

The awk one assumes the lines don't start with a blank character.

Method #1

You can use this sed command to do it:

$ sed 's/\([A-Za-z]\)\1\+/\1/g' file.txt

Example

Using your above sample input I created a file, sample.txt.

$ sed 's/\([A-Za-z]\)\1\+/\1/g' sample.txt 
NAME
       nice - run a program with modified scheduling priority

       SYNOPSIS
              nice     [-n    adjustment]    [-adjustment] [--adjustment=adjustment] [command [a$

Method #2

There is also this method which will remove all the duplicate characters:

$ sed 's/\(.\)\1/\1/g' file.txt

Example

$ sed 's/\(.\)\1/\1/g' sample.txt 
NAME
    nice - run a program with modified scheduling priority

    SYNOPSIS
       nice   [-n  adjustment]  [-adjustment] [-adjustment=adjustment] [command [a$

Method #3 (just the upper case)

The OP asked if you could modify it so that only the upper case characters would be removed, here's how using a modified method #1.

Example

$ sed 's/\([A-Z]\)\1\+/\1/g' sample.txt 
NAME
       nice - run a program with modified scheduling priority

       SYNOPSIS
              nice     [-n    adjustment]    [-adjustment] [--adjustment=adjustment] [command [a$

Details of the above methods

All the examples make use of a technique where when a character is first encountered that's in the set of characters A-Z or a-z that it's value is saved. Wrapping parens around characters tells sed to save them for later. That value is then stored in a temporary variable that you can access either immediately or later on. These variables are named \1 and \2.

So the trick we're using is we match the first letter.

\([A-Za-z]\)

Then we turn around and use the value that we just saved as a secondary character that must occur right after the first one above, hence:

\([A-Za-z]\)\1.

In sed we're also making use of the search and replace facility, s/../../g. The g means we're doing it globally.

So when we encounter a character, followed by another one, we substitute it out, and replace it with just one of the same character.

awk – Multiline Regexp with Grep, Sed, Awk, and Perl

You can do this with Awk by setting the "Record Separator" variable to be a regex matching at least two consecutive newline characters:

awk -v RS='\n\n+' '/1.*2.*3/' file.txt

You can also set the "Field Separator" to be a single newline character:

awk -v RS='\n\n+' -F '\n' '$1 == "LINE OF TEXT 1" && $2 == "LINE OF TEXT 2" && $3 == "LINE OF TEXT 3"' file.txt

Broken up for readability:

awk -v RS='\n\n+' -F '\n' '
  $1 == "LINE OF TEXT 1" &&
  $2 == "LINE OF TEXT 2" &&
  $3 == "LINE OF TEXT 3"
' file.txt

With your requirement of only printing the filename if a match is found, you can do this like so:

awk -v RS='\n\n+' -F '\n' '
  $1 == "LINE OF TEXT 1" &&
  $2 == "LINE OF TEXT 2" &&
  $3 == "LINE OF TEXT 3" {
    match++
  }
  END {
    if (match) {
      print FILENAME
    }
' file.txt

But considering you are talking about using find in combination with awk, I'd recommend just using Awk for the exit status and using find for the printing:

find . -type f -exec awk -v RS='\n\n+' -F '\n' '
  $1 ~ /LINE OF TEXT 1/ &&
  $2 ~ /LINE OF TEXT 2/ &&
  $3 ~ /LINE OF TEXT 3/ {
    exit 0
  }
  END { exit 1 }
' {} \; -print

That way, if you want to do something else before printing (some other find primary), you're already set up to do so.

Best Answer

Related Solutions

How to remove duplicate letters using sed

Method #1

Example

Method #2

Example

Method #3 (just the upper case)

Example

Details of the above methods

awk – Multiline Regexp with Grep, Sed, Awk, and Perl

Related Question