GREP and REGEX – Troubleshooting Common Issues

arch linuxgrepregular expression

I'm trying to get the output from ls /dev to match 'tty' that ends with numbers between 1-4.

So from:

tty5
tty4
tty2
tty6
tty1

Should match:

tty4
tty2
tty1

The regexp

"\s([tty]+[0-4])\s"

works in RegExr.

I've tried using this with grep:

ls /dev | grep -E \s([tty]+[0-4])\s

ls /dev | grep -E \s([tty]\+\[0-4])\s

ls /dev | grep -Ex \s([tty]+[0-4])\s

ls /dev | grep -P \s([tty]+[0-4])\s

as I've read in other posts, still I can't make it work.

Best Answer

The reason it isn't matching is because you are looking for whitespace (\s) before the string tty and at the end of your match. That never happens here since ls will print one entry per line. Note that ls is not the same as ls | command. When the output of ls is piped, that activates the -1 option causing ls to only print one entry per line. It will work as expected if you just remove those \s:

ls /dev | grep -E '([tty]+[0-4])'

However, that will also match all sorts of things you don't want. That regex isn't what you need at all. The [ ] make a character class. The expression [tty]+ is equivalent to [ty]+ and will match one or more t or y. This means it will match t,or tttttttttttttttt, or tytytytytytytytytyt or any other combination of one or both of those letters. Also, the parentheses are pointless here, they make a capture group but you're not using it. What you want is this:

$ ls /dev | grep '^tty[0-4]$'
tty0
tty1
tty2
tty3
tty4

Note how I added the $ there. That's so the expression only matches tty and then one number, one of 1, 2, 3 or 4 until the end of the line ($).

Of course, the safe way of doing this that avoids all of the dangers of parsing ls is to use globs instead:

$ ls /dev/tty[0-4]
/dev/tty0  /dev/tty1  /dev/tty2  /dev/tty3  /dev/tty4

or just

$ echo /dev/tty[0-4]
/dev/tty0 /dev/tty1 /dev/tty2 /dev/tty3 /dev/tty4
Related Question