I have a couple of C++ source files (one .cpp and one .h) that are being reported as type data by the file
command in Linux. When I run the file -bi
command against these files, I'm given this output (same output for each file):
application/octet-stream; charset=binary
Each file is clearly plain-text (I can view them in vi
). What's causing file
to misreport the type of these files? Could it be some sort of Unicode thing? Both of these files were created in Windows-land (using Visual Studio 2005), but they're being compiled in Linux (it's a cross-platform application).
Any ideas would be appreciated.
Update: I don't see any null characters in either file. I found some extended characters in the .cpp file (in a comment block), removed them, but file
still reports the same encoding. I've tried forcing the encoding in SlickEdit, but that didn't seem to have an effect. When I open the file in vim
, I see a [converted]
line as soon as I open the file. Perhaps I can get vim to force the encoding?
Best Answer
Vim tries very hard to make sense of whatever you throw at it without complaining. This makes it a relatively poor tool to use to diagnose
file
's output.Vim's "[converted]" notice indicates there was something in the file that vim wouldn't expect to see in the text encoding suggested by your locale settings (LANG etc).
Others have already suggested
cat -v
xxd
You could try grepping for non-ASCII characters.
grep -P '[\x7f-\xff]' filename
The other possibility is non-standard line-endings for the platform (i.e. CRLF or CR) but I'd expect
file
to cope with that and report "DOS text file" or similar.