Sql-server – the fastest way to insert large numbers of rows

insertperformancequery-performancesql serversql-server-2012

I have a database where I load files into a staging table, from this staging table i have 1-2 joins to resolve some foreign keys and then insert this rows into the final table (which has one partition per month). I have around 3.4 billion rows for three months of data.

What is the fastest way to get these rows from staging into the final table ? SSIS Data Flow Task (that uses a view as source and has fast load active) or an Insert INTO SELECT …. command ? I tried the Data Flow Task and can get around 1 billion rows in around 5 hours (8 cores / 192 GB RAM on the server) which feels very slow to me.

Best Answer

One common approach:

  1. Disable / drop indexes / constraints on target table.
  2. INSERT dbo.[Target] WITH (TABLOCKX) SELECT ...
  3. With credit to JNK of course, you can do the above in batches of n rows, which can reduce the strain on the transaction log, and of course means that if some batch fails, you only have to-start from that batch. I blogged about this (while in reference to deletes, the same basic concepts apply) here: http://www.sqlperformance.com/2013/03/io-subsystem/chunk-deletes
  4. Re-enable / re-create indexes / constraints on target table (and perhaps you can defer some of those, if they are not necessary for all operations, and it is more important to get the base data online quickly).

If your partitions are physical and not just logical, you may gain some time by having different processes populate different partitions simultaneously (of course this means you can't use TABLOCK/TABLOCKX). This assumes that the source is also suitable for multiple processes selecting without overlapping / locking etc., and making that side of the operation even slower (hint: create a clustered index on the source that suits the partitioning scheme on the destination).

You may also consider something a lot more primitive, like BCP OUT / BCP IN.

I don't know that I would jump to SSIS to help with this. There are probably some efficiencies there, but I don't know that the effort justifies the savings.