Sql-server – How to verify data dumps between two MS SQL and MySQL

MySQLmysql-5.5sql serversql-server-2008

I'm trying to verify data exported from MSSQL and imported into MySQL is 100% accurate using MD5 hash. On Linux, I'm using md5sum and WinMD5 on Windows. The encoding is the same: UTF8. I tried both UNICODE and ANSI, but the results were the same.

When I visually compare the data in the export files or the hashes of individual cell values, it's 100% accurate. However, comparing the two files results in different hashes. Anyone know why the hashes would be different?

Your comments are much appreciated.

UPDATE:
The data is exported from MSSQL using BCP into a text (.txt) file. MySQL data is exported using: SELECT * INTO OUTFILE…. into a text file. Both use "%^&" as the field delimiter,"!@#$" as the row delimter, and escape using "\".

Best Answer

There can be various reasons why file hashes different:

  • Line Endings differ;
  • Spacing might differ (tabs vs spaces);
  • Different layout for same data
  • Encoding (of the file that data is saved in)
  • Quotes (single vs double)
  • Case sensitivity

What I would do, is compare both files with diff tool. git diff can come in help here as it has ability to diff word by word and if you pass it --no-index option don't even have to place them in git repository. This command can help

$ git diff --no-index --color-words file1.txt file2.txt