Sql-server – Importing large file while keeping table available for other operations

bulkimportsql-server-2008

We need to import millions of rows in a quite big table.

Say like 5 million rows into a table with more than 1 billion rows.

After preprocessing the import file we will have update operations and insert operations.

I have a couple of questions:

  1. How the table will be affected? During this import (updates and inserts) what could happen to other processes trying to read or write data to this table?

  2. Definetely we should be using bulk operations right? Insert and updates. I know bulk insert functionality as being something extremely fast compared to normal inserts. Does SQL Server 2008 provides something similar for UPDATES?

  3. Which techniques or strategies should we consider in order to make this process as fast as possible and at the same time keep the underlying table available for other processes to read and write to it while importing?

My ideas are:

All the row to be imported will be of the same UserId, so having this table partitioned by userid will improve the performance right? As all the other queries related to other users which doesnt belong to the affected partition will not be affected… Is this correct?

What about indexes? Should I be droping them and recreating on the affected partition?

I will continue reading and testing on this topic. If anyone has experience on some of the above questions would be great or could point me to some related information would be great!

Best Answer

Holy cow, you've got a lot of questions in here:

How the table will be affected? During this import (updates and inserts) what could happen to other processes trying to read or write data to this table?

With careful index design and partitioning, you can get away with minimal disruption to other queries. Discussing how to implement partitioning is a little beyond what we can talk through here - check out Louis Davidson's book on SQL 2008 database design. It covers concurrency pretty well.

I know bulk insert functionality as being something extremely fast compared to normal inserts. Does SQL Server 2008 provides something similar for UPDATES?

Not really. You can't update in bulk because you could be changing rows all over the table. This is where index design comes in (as explained in Louis's book).

All the row to be imported will be of the same UserId, so having this table partitioned by userid will improve the performance right? As all the other queries related to other users which doesnt belong to the affected partition will not be affected... Is this correct?

Only if your indexes are also partitioned by UserId, and that's rarely the case. Usually you'd want indexes arranged in a different manner - but again, that's covered in the book.