Mysql – Saving LZW encoded data into a theSQL database

blobMySQLPHP

I send LZW compressed data from the client side with JavaScript to the server.
The problem is that the data becomes corrupted after saving it to database.

So my question is, what collation should I use to accomplish that?
(My current one is latin1-default.)

I already checked if the problem arises during the data transfer from client to server and vice versa by sending encoded data to the HTTP server and sending it back (PHP-echo) immediately without processing it. I could decode LZW properly. So it should definitely be a problem with the database.

More information about the schema:
I only have a single table with three columns:

data BLOB -- I also tried TEXT
user_id INT 
type VARCHAR

This is how I save the data:

INSERT INTO `svg`.`saved_data` (`user_id`, `data`, `type`) 
VALUES ('{$user_id}', '{$data}', '{$type}');

Best Answer

You most likely have codepage issues here, where some parts of the process is interpreting the binary data as text and performing some sort of conversion instead of leaving it well alone. This may not be in your database, it could also be in the server-side code that is performing the inserts/updates and selects, or it could even by your browser that is interfering at one end of the process - so this question may be better suited to StackOverflow. Whether the question belongs here on over on SO, you need to give a lot more detail to get a good answer. Extra detail that would be useful includes:

  • What datatype(s) are you using to store the information? mySQL has specific types for binary data which have no collation so you should be using them not normal text types - if you are not then this is most likely the cause of the problem you are experiencing.
  • What language and database bindings are you using for the server-side code?
  • How are you getting the data into the database? Fully manufactured (ad-hoc) SQL? A parametrised statement? Stored procedure call?
  • How are you getting the data out again?
  • How is the data being corrupted? Extra characters added? Characters dropped? Changes to specific characters? Almost complete bejigglement of the data? Providing an example of the input data and the corrupt output may help if you can create a small example (adding massive amounts of data to your question won't be a popular action).

One option to remove the need to worry about such codepage issues is to keep your data in plain ASCII which should result in clean passage in, through & out of almost anything. You can do this by encoding your data using base64 immediately after compression and decoding it immediately before decompression. This has one obvious significant disadvantage though: you are going to increase the size of the stored data by 33%. If you are getting a large compression rate from the input data this may not be a massive issue (if you are getting compressed data at about 25% of the original then you are still getting ~33% after base64).