How to scan for invalid characters on gedit

gedit

I am having a dilemma whether to edit a javascript file or not. When I open it with gedit, it shows the following warning:

The file you opened has some invalid characters. If you continue
editing this file you could corrupt this document. You can also choose
another character encoding and try again.

The current encoding is UTF-8. As the file has over 100,000 lines of code, is there a quick way to scan for the invalid characters?

Best Answer

As the file is UTF-8 you could run isutf8. An additional utils package. It gives you both line, char and offset for bad bytes.

Then use xxd, hexdump or the like to analyze.

Unfortunately it stops at first crash. But then again it depends on the file. Could be there is only one bad byte ;)

Have some C code that does a similar analysis but for entire file. It is on a disk somewhere long forgotten. Could try to find it if in need.

Else yes, the quick and not that dirty way would be to do a diff between a copy saved with gedit – as proposed by the good mr. @vonbrand.

Related Question