I have an installation of trac, which uses MySQL. It has been upgraded so many times, moved servers etc. and chances are that the MySQL character sets were not always set correctly or consistently over the years. Currently all of them are utf8.
When attempting to upgrade the data using "trac-admin wiki update", I'm getting an error message that a byte ("UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 4274: invalid start byte") is not valid unicode. Unfortunately trac-admin gives me no hint where (table/row/column) to look for that byte sequence, or what I could do to fix it.
My question is not about trac/trac-admin, however, it's about the database. How would you go about finding, "somewhere" in the database, the offending bytes, and replacing them with something that is at least valid utf8. I have attempted to mysqldump the database and to re-import it, but MySQL gives no indication that anything might be wrong. The invalid bytes get re-imported.
Ideas?
Best Answer
You could try turning on the general query log, which logs every query the server receives, and then use that information to see what's being queried, and then issue those queries yourself and examine the data.
To see the current setting for the general log (location and whether it's enabled):
To turn it on:
Remember to turn if off again when you're done, to avoid filling the disk and potentially degrading performance with excessive I/O.
There are also options for logging to a table instead of a file.