Bash – difference between .* and * in regular Expression

bashgrepregular expression

I've a file named "test" that contains

linux
Unixlinux
Linuxunix
it's linux
l...x

now when i use grep '\<l.*x\>' , it matches :

linux
it's linux
l...x

but when i use grep '\<l*x\>' , it only matches:

l...x , but according to the reference guide,
when using * , The preceding item will be matched zero or more times, i.e it should match anything that starts with 'l' and ends with 'x'

Can anyone explain why ,it's not showing the desired result or if i've understood it wrong ?

Best Answer

notation (.*)

The * in the regular expressions .* and * is referring to a count, not characters per say, more exactly it means 'zero or more'. Furthermore, the . means 'any single character'.

So when you put them together you get 'zero or more of any characters'. For example strings like these:

  • linux
  • linnnnnx
  • lnx
  • hi linux
  • lx

Would be matched by <l.*x>. The last one is important, it shows that the .* can match nothing too.

notation (*)

The use of * alone as I said is a counter. So when you put it after a letter such as 'l' the * is saying 'zero or more of l'.

Notice if we grep for l*x, this will match l...x, but probably not for the reason you'd think.

% echo "l...x" | grep "l*x"
l...x

It's matching on the trailing 'x'. The 'l' has nothing to do with why this is getting matched, other than the fact that the 'x' is preceded by 'zero or more l's'.

Related Question