Thesqldump collation issues

backupcollationMySQLutf-8

I have tried my hand at building a shell script to backup our wordpress database using tar balls and chron jobs on a Debian environment.
The problem I am facing now is that our wp database after exporting, using the script, and then importing, using phpmyadmin, loses some special characters. Most noteably the sign.
Phpmyadmin lists the database as latin1_swedish_ci and that is the collation setting I am selecting in the Import tab of phpmyadmin (iso-8859-1).
I am using the following script to export the database:

mysqldump –user=user –password=password –default-character-set=latin1 –skip-set-charset databasename -r wp-db.sql

tar -cpzf /home/backups/wp-backups/website-wp-$(date +"%d-%m-%Y–%H-%M").tar.gz wp-db.sql && rm wp-db.sql

My best guess is that there must be some issue with collation but I am not aware where some type of conversion happens in the steps that I am taking. At some point the must be dropped.
When inspecting the wp-db.sql using file it is described as iso-8859-1, hence I wonder where conversion takes place.
I have since tried to test some things and discovered that I am not capable of converting a UTF-8file to iso-8859-1if it includes a sign. At least not using

iconv -f UTF-8 -t ISO_8859-1 test

From what I gather mysql runs in UTF-8 natively, might that be the issue?

If I do not bother with any of this it will completely ruin things. Pretty much everything that isnt wp files but stored in the database will be displayed incorrectly or will be completely missing. By adding these two arguments to my script I have eliminated most of the issues apart from special character problems.

Best Answer

"displayed incorrectly" -- That does not necessarily mean that the text is stored incorrectly. How many steps do you have here?

You have data in a WP database? Please provide SHOW CREATE TABLE for one of the tables so we can verify whether it is latin1 or utf8. Also, do SELECT col, HEX(col) to see what you have in the table for . For utf8, the hex should be E282AC. As you pointed out, it cannot be stored in latin1. If you don't see that hex, stop; we need to debug things before this point. Else continue...

Then you are doing a dump? Let mysqldump specify the CHARACTER SET (not COLLATION).

Take a peek at the dump file. Well, this may be useless, because it depends on what tool you use to peek at it.

Then you are reloading the data? After reloading, what does the hex say? Again, stop if it is not that same hex.

Now you are running WP and trying to display stuff. On a web page? Look at the HTML; there should be a <meta... > tag very near the start. What does it say? In particular, does it say charset=UTF-8?

One more thing to check is the SET NAMES utf8 (or equivalent) -- phpmyadmin establishes the CHARACTER SET one place; find it. And find it for WP.