SQL Server – Can Two BULK INSERTs Execute in Parallel?

bulk-insertparallelismperformancequery-performancesql serversql-server-2016

I have two heavy queries to run, and both of them take about 1 minute to run. Both queries are using the Bulk Insert command to insert data from text files to tables in the database.

For the two tables importing data, there are no indexes/triggers/constraints, just two empty tables to load data into.

For the two text files to load, each of them has around 20M rows.

Query 1:

Bulk insert table1
FROM 'table1.txt'  WITH
        (FIELDTERMINATOR ='|',
         ROWTERMINATOR = '0x0a',
         TABLOCK)

Query 2:

Bulk insert table2
FROM 'table2.txt'  WITH
        (FIELDTERMINATOR ='|',
         ROWTERMINATOR = '0x0a',
         TABLOCK)

I want to compare the running time under different scenarios.

Scenario 1: two queries running serially in one editor:

  • Query 1: 1 m 18 s
  • Query 2: 1 m 2 s
  • The total running time is 2m 20s.

Scenario 2: each query in a separate editor of the same Studio instance, running concurrently:

  • Query 1: 2 m 36 s
  • Query 2: 2 m 09 d
  • The total running time is 2m 40s

Scenario 3: each query in a separate Studio instance, running concurrently:

  • Query 1: 2 m 29 s
  • Query 2: 2 m 19 s
  • The total running time is 2m 29s

While scenario 2 and 3 seem to run queries concurrently, why the running time hasn't change much? For all scenarios, the CPU usage is between 17% to 21%, the disk usage is 50MB to 70 MB per second. No significant differences between the usage of CPU and disk are observed.

I am wondering what's happening under the hood? How does SQL Server execute these queries? Why is the running time nearly the same for all three scenarios? Is there any way to speed it up?

I'm using SQL Server 2016 Developer Edition on Windows 10 64-bit. I have quad core i7 and SSD in my laptop.

After testing more scenarios with 4 queries running together, I think the possible cause of low usage of CPU and disk is the hyperthreading with my Core i7:

  • With two editors running two queries concurrently, the most CPU usage is 25%.

  • With 4 editors running 4 queries concurrently, the most CPU usage is 50%.

Are you guys aware of ways to let one query exclusively use one core? Turning off hyperthreading is not an option for me, since my ThinkPad T460p doesn't support such an option in the BIOS.

Best Answer

After a long and beneficial discussion with Dan Guzman in this MSDN forums thread, the underlying scheduling scheme is finally clear to me. Here is a brief answer borrowing from the thread above:

So what's happening here is that the 2 SSMS connections both initially use the same scheduler because it has the least load. The BULK INSERTs then both run on the same scheduler, which is the reason for the high SOS_SCHEDULER_YIELD waits.

The choice of which scheduler is used is not based on query cost. SQL Server (actually SOS) basically tries to balance the workload among schedulers by examining the load_factor column of sys.dm_os_schedulers.

When a new task is created for a query execution, SOS prefers to use the same scheduler it used for the last request on the session. However, it will use a different scheduler if the load factor of the preferred scheduler is greater than a certain percentage of the other scheduler's average load factor.

In depth discussion could be found in the link above.

Thanks all guys for your enthusiastic help!