There's a common core of regular expression syntax however there are distinct flavors. Your expression appears to contain some features specific to the perl flavor, in particular the use of complex lookaround assertions describing the start and end of the pattern to be matched, whereas grep defaults to a basic regular expression (BRE) syntax that only supports a simpler set of these zero-length matches such as line- (^
,$
) and word-anchors (\>
, \<
).
You can enable perl-compatible regular expression (PCRE) support in grep using the -P command line switch (although note that the man page currently describes it as "experimental"). In your case you probably want the -o switch as well to only print the matching pattern, rather than the whole line i.e.
cat /var/log/dpkg.log | grep 'remove' | grep -oP '(?<=remove)(.*?)(?=:)'
Be aware that this expression may fail if it encounters packages that do not have the :i386 suffix since it may read ahead to a matching colon in the next word, e.g.
echo "2013-09-07 08:31:44 remove cifs-utils 2:5.1-1ubuntu2 <none>" | grep -oP '(?<=remove)(.*?)(?=:)'
cifs-utils 2
You may wish to look at awk instead e.g.
cat /var/log/dpkg.log | awk '$3 ~ /remove/ {sub(":.*", "", $4); print $4}'
As well as BRE and PCRE, Gnu grep has a further mode called extended regular expression (ERE), specified by the -E command line switch. The man page notes that
In GNU grep, there is no difference in available functionality
between basic and extended syntaxes.
However you should note that "no difference in available functionality" does not mean that the syntax is the same. For example, in BRE the +
character is normally treated as literal, and only becomes a modifier meaning 'one or more instance of the preceding regular expression' if escaped, i.e.
$ echo "123.456" | grep '[0-9]+\.[0-9]+'
$ echo "123.456" | grep '[0-9]\+\.[0-9]\+'
123.456
whereas for ERE it is exactly the opposite
$ echo "123.456" | grep -E '[0-9]+\.[0-9]+'
123.456
$ echo "123.456" | grep -E '[0-9]\+\.[0-9]\+'
A similar distinction applies for sed
invoked without and with the -r
switch.
Yes, you can do that. Your command would be:
cat /boot/config-3.19.0-32-generic | grep CONFIG_ARCH_DEFCONFIG | awk -F'"' '{print $2}'
which would return only:
arch/x86/configs/x86_64_defconfig
The awk
command uses field separators with the -F
command, and to set it to use double-quotes, you type it in with single-quotes around it like -F'"'
. Then the '{print $2}'
tells awk to print the second set after the field separator.
Hope this helps!
Best Answer
The first part of your two commands (
cat FILENAME
) is always the same and just prints the specified file's content to the STDOUT stream. I will not explain it any further.The point of our interest is the
grep
part.Syntax of
grep
:You can pass
grep
some options to tweak its behaviour (e.g set the used RegEx flavour or control the output formatting), but those are not used in your case.The next single argument must be the pattern to match, where a regular expression ("RegEx") or a fixed string (if
grep
is called with the-F
option) is required. In your example, this is theinstall
or"\ install\ "
part. I will explain it in the next paragraph.After that, you specify the source of the data to match. This can either be a file name, or nothing. In the second case,
grep
will read from the STDIN stream (standard input: normally what you type with the keyboard), where you pipe (|
) the output from the previous command.How to correctly pass the "PATTERN" argument?
The pattern parameter must be a single argument. That means, you can't just pass several words or anything containing spaces or shell special characters here, because spaces are treated as argument separators in Bash and probably every other shell as well, and shell special characters such as
;
will break the command.But you have two options to include spaces into the pattern to match anyway:
Put the entire string to match in single (
'...'
) or double ("..."
) quotes. This way, the shell parses the entire quotes-enclosed string as one argument and passes it on togrep
.Escape every space in the pattern with a backslash (
\
). That means, you write a backslash before every space which you don't want to be seen as argument separator by the shell. But note that if you want to have an actual backslash in the pattern, you must escape it as well, by writing another backslash before it.If we now analyse the difference of your two
grep
command examples, we see the difference in what they match:This matches the pattern
install
. (Note the leading and trailing space!)Here we see both methods: double quotes around the entire pattern and backslash-escaping the spaces inside it. It's superfluous to be honest, one would have been enough. Although it doesn't hurt in this case, you should not do this and decide on one method. Usually I would recommend to use quotation marks, as it's easier to read.
This matches the pattern
install
. (No surrounding spaces.)Here the pattern only consists of the word
install
, nothing else. No spaces.Difference between your commands:
As I said, the first of your examples only matches the word
install
when it is surrounded by spaces. It would not match if e.g. there's a full stop or any other character directly before or after it instead. It would also not match the word directly at the beginning or end of a line.The second example does not care about any spaces before or after the word
install
. It also matches at line beginnings and endings as well as if it is surrounded by any punctuation. It even matches if there is a word containing this letter sequence anywhere, e.g. "uninstall", "reinstall" or "installation" too!Example with correct/useful backslash escaping:
As in the example you provided the backslashes are superfluous, here the same example without quotes, but with backslash escaping only instead:
Or if you want to match the string "I like Ubuntu" in a file
/home/you/path with spaces/textfile
without using quotes, you would do that like this:You see that you must escape spaces in path- or filenames as well - or quote them. The line above is equal to the line below: