How to grep for a value after an = sign

grepregular expression

How do I specify a grep to look for all possible values

i.e., with a file like (the 9701 could be any value):

9701=1?? 
9701=10.Pp 
9701=1a 8a 
9701=3.a_tt 
9701=1/a -00
9701=Bg1998pps

I could try

egrep -Eo '9701=[A-Z]+[a-z]+[0-9]{1,50}' test.log

This only gives me Uppercase/lowercase & number values. How do I include values with special characters in the grep request? i.e., with spaces, dots, hyphens, underscores etc.

Best Answer

To include all other characters in your grep you could use this:

egrep -Eo '9701=.{1,50}' test.log

The dot represents ANY character.

But that won't cut off the "9701=" part of each line. To achieve this you could use cut

cut -d "=" -f 2- test.log

Though this would stumble if the value would include = as well.

sed would fix this for you and is ultimately the better solution for your problem:

sed -r 's/^9701=(.*)$/\1/' test.log

sed 's/^9701=\(.*\)$/\1/' test.log

or even

sed 's/^9701=//' test.log

Related Solutions

Grep and Escaping a Dollar Sign

There's 2 separate issues here.

grep uses Basic Regular Expressions (BRE), and $ is a special character in BRE's only at the end of an expression. The consequence of this is that the 2 instances of $ in $Id$ are not equal. The first one is a normal character and the second is an anchor that matches the end of the line. To make the second $ match a literal $ you'll have to backslash escape it, i.e. $Id\$ . Escaping the first $ also works: \$Id\$, and I prefer this since it looks more consistent.¹
There are two completely unrelated escaping/quoting mechanisms at work here: shell quoting and regex backslash quoting. The problem is many characters that regular expressions use are special to the shell as well, and on top of that the regex escape character, the backslash, is also a shell quoting character. This is why you often see messes involving double backslashes, but I do not recommend using backslashes for shell quoting regular expressions because it is not very readable.

Instead, the simplest way to do this is to first put your entire regex inside single quotes as in 'regex'. The single quote is the strongest form of quoting the shell has, so as long as your regex does not contain single quotes, you no longer have to worry about shell quoting and can focus on pure BRE syntax.

So, applying this back to your original example, let's throw the correct regex (\$Id\$) inside single quotes. The following should do what you want:

grep '\$Id\$' my_dir/my_file

The reason \$Id\$ does not work is because after shell quote removal (the more correct way of saying shell quoting) is applied, the regex that grep sees is $Id$ . As explained in (1.), this regex matches a literal $Id only at the end of a line because the first $ is literal while the second is a special anchor character.

^{¹ Note also that if you ever switch to Extended Regular Expressions (ERE), e.g. if you decided to use egrep (or grep -E), the $ character is always special. In ERE's $Id$ would never match anything because you can't have characters after the end of a line, so \$Id\$ would be the only way to go.}

Grab text out of vtt file

Since your file appears to consist of a sequence of records separated by one or more blank lines, I'd suggest trying something based on the paragraph modes of either awk or perl.

For example, if you always need to strip off the first two lines, like

1
00:00:00.096 --> 00:00:05.047

you could split into newline-delimited fields within blank-separated paragraphs and skip the first two fields using either

awk -vRS= -vORS= -F'\n' '{for(j=3;j<=NF;j++) print $j; print " "}' file.vtt

perl -F'\n' -00ne 'print join("", @F[2..$#F]), " "' file.vtt

If you can't rely on there being a fixed number of fields (lines) to be removed, then it's fairly easy to add a regular expression test - a little easier in perl since it allows us to grep directly on arrays rather than writing an explicit loop. For example, to split into blank-separated records and then print only those fields (lines) having at least one sequence of at least 3 alphabetic characters, you could use

perl -F'\n' -00ane '
  print join("", grep { /[[:alpha:]]{3}/ } @F), " "
' file.vtt

If you want to exclude the WEBVTT string you can simply skip the first record, i.e.

perl -F'\n' -00ane '
  print join("", grep { /[[:alpha:]]{3}/ } @F), " " if $. > 1
  ' file.vtt

It will be down to you to choose a suitable regex that capture the wanted lines and excludes the unwanted ones. You can add an END block in either awk or perl if you want to add a final newline to the concatenated output.

NOTE: since (based on the discussion in comments) your files appear to have DOS-style CRLF line endings, you will need to deal with those - either by modifying the field and record separators in the above commands accordingly, or by stripping out the CRs first e.g.

sed 's/\r$//' file.vtt | 
  perl -F'\n' -00ane '
    print join("", grep { /[[:alpha:]]{3}/ } @F), " " if $. > 1
  '
you're the four functions if you would of management first of all you have the planning the planning stages basically you were choosing appropriate  organizational goals and courses action to best achieve those goals steeldriver@xenial-vm:~/test/$

Best Answer

Related Solutions

Grep and Escaping a Dollar Sign

Grab text out of vtt file

Related Question