Sql-server – SQL Server – Export large table without primary key

sql server

I need to sync a large table ~500 millions rows without a primary key between SQL Server and MySQL. The table has only a clustered composite non-unique index.

I do have a ODBC connection between the servers, but an import of ~8 million rows took around 45 minutes, so I believe a larger single import would be unreasonable as interruptions may occur at any point. I can't change the existing table structure, I can add other tables. After further reading, offset / fetch is not an option for large tables. "Select … where x between … and … " is not an option as I don't have an unique key.

How can I export the table in batches that are guaranteed to contain all rows?
My problem is that since the clustered key is not unique, ordering after it would not guarantee the physical rows have the same order between consecutive queries and ordering after all columns would take too long. And how would you recommend to migrate the batches, through ODBC or CSV files?

Best Answer

Assuming you don't have updates or deletes against source table you can try the following :
1. Make a copy of existing table using CTAS syntax (for SQLServer it's SELECT * into source_table_copy FROM source_table). Such operation is very fast even for huge tables.
2. Add after insert trigger on source_table that copies new record[s] to source_table_copy.
3. Now when all new records in source_table go to source_table_copy as well, and you can move data from copied table to Mysql in batches. For instance, if you have a link between 2 servers, everything can be done within the body of TSQL stored procedure.
E.g. a piece of code that move up to 20 records to new server might look like

 --declare table variable to keep deleted records until they delivered to target host 
  BEGIN TRANSACTION;
  DELETE TOP (20) FROM source_table_copy OUTPUT DELETED.* INTO @Table_Var;

  --insert data into linked server , or to csv file
  COMMIT; 

It's also possible to use CURSOR to read data and then delete with where current of clause.

**Ideally you need to prevent applications from inserting data into source_table during step 1. If it's absolutely impossible , I'll go with an after insert trigger which is added right before step 1 and removed right after it's done which copies data to some other table I can later merge with source_table_copy.