Sql-server – Extremly large import taking over 24hrs

sql serversql-server-2016

We have a query that we run to import a Master Do Not Call List into our database with about 200 million records.

We first mark the existing numbers in the database table with an expiration date. This process takes about 19hrs or more, and eats up the log files, and tempdb log files.

The second part of the query is the import from a text file located on the c: drive. This part also takes quite sometime, and last time SQL Studio crashed so the query tanked and did not finish.

Below is the query that we run:

enter image description here

To give some background on the server setup:
Windows 2016 standard
intel xeon 8cores 2.59ghz
128gb ram
sql 2016 standard

Data volume located on 3par 8200 SAN spinning drives.

Log volume located on 3par 8200 SAN spinning drives (current size is now 1.46tb because 3 times we ran out of space)

Temdb data and log volumes located on 3par SAN as well. (one volume)

Is there a better way to run this query and make it more efficient? Should we create a job with two steps one for the update expiration, and second for bluk insert? Should we move data and log volumes to SSDs on the 3par…currently they are located on spinning drives.

What do you think we can do to make the process more efficient?

Any information would be greatly appreciated!
Thanks!

Best Answer

The first problem I see is that you are performing a massive update (19+hours in duration) in a single run, and that is why your server is running low in resources. You need to perform the updates in batch. They will be faster, and your server setup should be able to cope with the resource load.

Try the following:

DECLARE @Rows INT,
        @BatchSize INT; 

SET @BatchSize = 2000;

SET @Rows = @BatchSize; 

WHILE (@Rows = @BatchSize)
  BEGIN
      UPDATE TOP (@BatchSize) tab
      SET    tab.expiration = dateadd(day, 5, getdate())
      FROM  I3dialer.dbo.MasterDNC tab
      WHERE tab.expiration < dateadd(day, 5, getdate());
      SET @Rows = @@ROWCOUNT;
  END;

Before testing this, be sure that there's an index in the expiration field to accelerate the query.

For the second part of your problem, other than increasing the batch insert to say, 10000 should help a bit. You can actually experiment with this to find your optimal batch size, for example, testing against 1 million records to see if there's any performance difference. There are also other things to consider. You should first load the data into an empty table, perform whatever operations you need to clean up your data, and finally insert them into your main table.

In summary, you should now have a 3 part process:

  1. Perform the batch update on your main table
  2. Insert data into a temporal or stage empty table
  3. Insert data from your stage table to your main table

You should definitively get a better performance after implementing this. Do share your results.