Sql-server – SQL Server bulk transfer help needed

bulk-insertbulkcopysql server

We've got this very badly designed logging table that we want to add functionality to. The problem is that it's already a scalability nightmare and we want to fix the design before adding to it, but we only have the nightly upgrade window to do it in.

I've seen a lot of articles about the various bulk copy options with SQL Server, claiming "We could move 80M rows in 10 minutes!" but so far my testing doesn't get anywhere near that, and I'd like suggestions on how to improve on what I'm seeing.

Before the upgrade, there's always a full backup. I'm only interested in the end result and don't want a huge transaction log. I also don't want it to take too long and I don't want to blow out the disk space with transaction logs or temp files.

The table's been out there a while, so in our bigger customer dbs, it's already over 50 million rows. Each row is about 350-400 bytes. The columns are something like this

IdentityColID int, [type] int, [subtype] int, 
created datetime, author nvarchar(100), Message nvarchar(max)

The problems with the design are

  • The primary clustered key is (type, subtype, created, identitycolid), so it's an insert nightmare. Blocksplits all over the place. And even doing a SELECT COUNT(*) takes like 8 minutes.

  • There aren't good indexes to support the types of queries desired

I wanted to make a new table where the primary clustered index is the IdentityColId and add the indexes to support the type of necessary queries, and then copy the existing data over and drop the old table.

So far, I tried using BCP to get the data out, and importing with

  • BCP

  • BULK INSERT

  • INSERT INTO … FROM OPENROWSET

The bcp export took about 25 minutes and the imports all took about 1.3 hour – about 1.5 hours. With Recovery Model Simple, the transaction log didn't grow but the cpu consumption was in the 60-65% range most of the time.

I tried just using T-SQL INSERT INTO NewTable SELECT * FROM OldTable, but even with Recovery Model Simple, the transaction log gets to 100 gig.

I tried using SSIS data import packages with the from/to model, and the net time was about an hour 20 minutes. With Recovery Model Simple, the transaction log stayed small.

Then I tried an SSIS Execute SQLTask package to effectively do the INSERT INTO NewTable... line within SSIS. That got the execution time down to about 1:15, but no matter what the recovery model, the transaction log ended up around 100 gig, though CPU consumption stays modest.

I'd like the end result to be one new table, so the suggestion from some of the articles I've read to parallelize into multiple result tables doesn't seem a profitable path. But so far, I just can't seem to approach those stats from the articles I've read.

Anyone have any suggestions on how I can goose this a bit?

Best Answer

I can't give the credit to Sir Swears-A-lot because his response was in a comment, but he was right. My assumption that dropping the primary key and creating a new one would be effectively the same or more expensive than loading up an empty new table was incorrect.

Setting recovery mode simple, dropping the primary key, recreating it (forcing the reshuffle), and adding all my new indexes afterwards clocked in at 40 minutes. That beat any bcp/SSIS method I'd tried by more than a half hour, so that was a big improvement.

And the transaction log grew by 10 gig but didn't go nuts like some of my other experiments.