Why does grep output lines that seemingly don't match the expression?
As mentioned in my comment this behaviour may be caused by a bug.
I am aware different locales affect character order but I thought the -o output below confirms this is not a problem here but I was wrong. Adding LC_ALL=C gives expected output.
I had this question after I saw locales affected the output.
[aa@bb grep-test]$ cat input.txt
aa bb
CC cc
dd ee
[aa@bb grep-test]$ LC_ALL=C grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ grep -o [A-Z] input.txt
C
C
[aa@bb grep-test]$ LC_ALL=C grep [A-Z] input.txt
CC cc
[aa@bb grep-test]$ grep [A-Z] input.txt
aa bb
CC cc
dd ee
[aa@bb grep-test]$
[aa@bb tmp]$ cat test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -o [A-Z] test
C
C
[aa@bb tmp]$ grep -E [A-Z] test
aa bb
CC cc
dd ee
[aa@bb tmp]$ grep -n [A-Z] test
1:aa bb
2:CC cc
3:dd ee
[aa@bb tmp]$ echo [A-Z]
[A-Z]
[aa@bb tmp]$ grep -V
GNU grep 2.6.3
...
[aa@bb tmp]$ bash --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
...
[aa@bb grep-test]$ command -v grep
/bin/grep
[aa@bb grep-test]$ rpm -q -f $(command -v grep)
grep-2.6.3-6.el6.x86_64
[aa@bb grep-test]$ echo grep [A-Z] input.txt | xxd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574 grep [A-Z] input
0000010: 2e74 7874 0a .txt.
[aa@bb grep-test]$ cmd='grep [A-Z] input.txt'; echo $cmd | xxd; eval $cmd
0000000: 6772 6570 205b 412d 5a5d 2069 6e70 7574 grep [A-Z] input
0000010: 2e74 7874 0a .txt.
aa bb
CC cc
dd ee
[aa@bb grep-test]$ xxd input.txt
0000000: 6161 2062 620a 4343 2063 630a 6464 2065 aa bb.CC cc.dd e
0000010: 650a 0a e..
[aa@bb grep-test]$
Best Answer
This looks like your locale collation rules being very ... helpful.
Try it with
to test that idea.
I have
in my shell startup to avoid this kind of trouble while still getting my unicode goodness.