Grep exact number of digits and some other characters

grepregular expressiontext processing

I'd like to parse a file containing 5 digit numbers separated by comma or dash, lines like :
12345,23456,34567-45678,12345-23456,34567

My goal is to find lines which have incorrect formatting eg. lines which contain numbers which are not composed of 5 digits being separated by other characters than comma or dash.

I tried to egrep the file with :

cat file.txt | egrep -v [-,]*[0-9]{5}[,-]*

  • but if I have a 6 digit number, it is matched, and the line is not displayed
  • and if I have a 4 digit number, it is not matched but other numbers from
    that same line are matched and the line is not displayed

To specify the lines content :

  • a number must be of 5 digits
  • ranges are defined with dash, like 12345-12389
  • a line can contain anything from a single number to several numbers and ranges in any order

Any suggestions please ?

Best Answer

grep -vxE '([0-9]{5}[,-])*[0-9]{5}'

Would report the incorrect lines.

Or if you also want to forbid 12345-12345-12345:

num='[0-9]{5}'
num_or_range="$num(-$num)?"
grep -vxE "($num_or_range,)*$num_or_range"
Related Question