When grepping log files, I often experience that grep does not find matches closer to the end of the file. Example file
For example, running
grep -n "demodulator_process" 2018_12_22_13_04_adfags-modem.log | less
shows up to line 2962 of the file, while the string is occurs further down in the file.
Running
grep -n "Finished" 2018_12_22_13_04_adfags-modem.log
which should catch the lasts lines in the file, does not return anything.
Does anyone know what causes this behavior?
I'm using grep version
--> grep --version
grep (GNU grep) 3.1
Thanks.
Best Answer
While the answer @Crypteya provided is a good practical answer, it may also be useful to understand where the problem areas may be in the log file itself.
Finding the problem areas
First, create a printable ASCII version of the log file:
strings 2018_12_22_13_04_adfags-modem.log > log_modified.txt
Then look at the differences:
diff -u 2018_12_22_13_04_adfags-modem.log log_modified.txt
As can be seen in the unified diff output, the original log file contains some non-printable ascii characters resulting in the file being identified as binary rather than text. The
strings
output ignores the unprintable characters and places the next text field on a new line.What are the non-printable characters?
We can use
hexdump
to view the hexadecimal and the ASCII side by side. And we know the problem areas occur just after the-PPI
printable characters.cat 2018_12_22_13_04_adfags-modem.log | hexdump -C |grep -A2 PPI
This reveals the location of the mysterious NULLs and several non-printable characters in your original log file.
Upon inspection of the problem lines with an ASCII chart:
ascii
We can write out the problem line:
or:
\0 e ^Z G ^B y ^U . @ ^X
grep
and NULLTo answer why
grep
doesn't work well with NULLs in an otherwise ASCII file, it's necessary to understand that strings in the C language are typically stored as linked lists and terminated with a NULL. So, removing the NULLs from the input intogrep
allows the algorithm to correctly process each line.