Strings enclosed in ” are ignored when using grep

grep

When I tried searching for a string within a file, the results didn't include those that were enclosed in single quotes.

For example:

grep -rn text folder/

The results didn't include strings the looked like this:

'text'

Care to tell me what I'm doing wrong?

UPDATE: I just tested it with a new file and it worked! It looks like it only happened in one particular file (a ruby file). Maybe its concerned with encoding?

Best Answer

There's a good chance you're running into some character coding issue. The file you're trying to grep could be in a different character encoding than your system's default encoding. Unixy systems typically default to UTF-8 these days, which is compatible with 7-bit ASCII, but not with any of the 8-bit ASCII extensions. Common 8-bit encodings in the US are ISO 8859-1 and Windows CP-1252. There are dozens more used in the rest of the world.

grep assumes all input is in your default system encoding. To grep a file in a different encoding, use iconv to convert it:

$ iconv -f iso8859-1 -t utf8 myfile.txt | grep something

I realize this is highly inconvenient for your recursive example, but the broader lesson is that if that fixes the problem, you should convert all the text files in that directory tree so they're compatible with your system character encoding. If you need Windows text editor compatibility, don't worry, most Windows text editors that focus on code editing cope with UTF-8, even though Windows uses UTF-16 natively these days.

Another possibility is that your file uses curly quotes. The quotes you type on your keyboard are straight quotes -- ASCII 39 -- but some word processors and text editors replace them with curly quotes, or U+2019 in this example.

I like to use this command for poking through a file to investigate character coding issues:

$ od -t x1 < myfile.txt | less

There are various "hexdump" programs available, but they often do unhelpful things like display the data as 16-bit words in little-endian format. Because od doesn't also have a printable text display column like any decent hexdump program, though, it works best for short files. I often cut down the example to something easy to test first.

Related Question