Linux – What could cause the file command in Linux to report a text file as binary data

bashcharacter encodinglinux

I have a couple of C++ source files (one .cpp and one .h) that are being reported as type data by the file command in Linux. When I run the file -bi command against these files, I'm given this output (same output for each file):

application/octet-stream; charset=binary

Each file is clearly plain-text (I can view them in vi). What's causing file to misreport the type of these files? Could it be some sort of Unicode thing? Both of these files were created in Windows-land (using Visual Studio 2005), but they're being compiled in Linux (it's a cross-platform application).

Any ideas would be appreciated.

Update: I don't see any null characters in either file. I found some extended characters in the .cpp file (in a comment block), removed them, but file still reports the same encoding. I've tried forcing the encoding in SlickEdit, but that didn't seem to have an effect. When I open the file in vim, I see a [converted] line as soon as I open the file. Perhaps I can get vim to force the encoding?

Best Answer

Vim tries very hard to make sense of whatever you throw at it without complaining. This makes it a relatively poor tool to use to diagnose file's output.

Vim's "[converted]" notice indicates there was something in the file that vim wouldn't expect to see in the text encoding suggested by your locale settings (LANG etc).

Others have already suggested

  • cat -v
  • xxd

You could try grepping for non-ASCII characters.

  • grep -P '[\x7f-\xff]' filename

The other possibility is non-standard line-endings for the platform (i.e. CRLF or CR) but I'd expect file to cope with that and report "DOS text file" or similar.

Related Question