Sql-server – SQL: What is slowing down INSERTs if not CPU or IO

performancesql serversql-server-2008

We have a database for a product that is write-heavy. We just bought a new server machine with a SSD to help. To our surprise, the insertions were not faster than on our old machine with much slower storage. During benchmarking we noticed that the IO rate exhibited by the SQL Server process was very low.

For example, I ran the script found on this page, except that I added a BEGIN TRAN and COMMIT around the loop. At best I could see the disk usage reach 7Mb/s, while CPU barely touched 5%. The server has 64Gb installed and is using 10. The total run time was 2 minutes 15 seconds for the first call down to around 1 minute for subsequent calls. The database is on simple recovery and was idle during the test. I dropped the table between each call.

Why is such a simple script so slow? The hardware is barely being used at all. Both dedicated disk benchmarking tools and SQLIO indicate that the SSD performs correctly with speeds upward of 500Mb/s for both reading and writing. I understand that random writes are slower than sequential writes, but I would expect a simple insert like this, to a table without clustered indexing, to be much faster.

Ultimately our scenario is much more complex, but I feel that I need to understand a simple case first. In a nutshell our application deletes old data, then uses SqlBulkCopy to copy new data to staging tables, performs some filtering, and finally uses MERGE and/or INSERT INTO depending on cases to copy the data to the final tables.

–> EDIT 1: I followed the procedure linked by Martin Smith, and I got the following result:

[Wait Type]  [Wait Count] [Total Wait (ms)] [T. Resource Wait (ms)] [T. Signal Wait (ms)]
NETWORK_IO          5008              46735                 46587        148
LOGBUFFER           901               5994                  5977         17
PAGELATCH_UP        40                866                   865          1
SOS_SCHEDULER_YIELD 53279             219                   121          98
WRITELOG            5                 145                   145          0
PAGEIOLATCH_UP      4                 58                    58           0
LATCH_SH            5                 0                     0            0

I find it weird NETWORK_IO takes most of the time, considering there are no result to display and no data to transfer anywhere other than to the SQL files. Does the NETWORK_IO type includes all IO?

–> EDIT 2: I created a 20Gb RAM disk and mounted a database from there. The best time I had on the SSD is 48s, with the RAM disk it went down to 37 seconds. NETWORK_IO is still the biggest wait. The maximum write speed to the RAM disk was about 250Mb/s while it's able to do multi gigabytes per second. It still wasn't using much CPU, so what's holding up SQL?

Best Answer

I know it's an old Question but this might still help searchers and it's a problem that pops up every now and then.

The main reason why you are hitting a performance ceiling without you seeing any resource bottleneck is because you've reached the limit of what is possible to process within one session single thread. The loop isn't processed in parallel, but all inserts are done serially.

In my case, it takes 36 seconds to insert 3 million rows. That means 36/30000000 = 0.000012 seconds per row. That's pretty fast. On my system, it simply takes 0.000012 to go through all the steps that are necessary.

The only way to get it done faster is start up a second session in parallel.

If I start 2 sessions in parallel both doing 15 million inserts. Both of them finish in 18 seconds. I could scale out more, but my current test setup is hitting 95% cpu with two parallel session, so doing 3 would skew the results since I would hit a CPU bottleneck.

If I start 2 parallel session both inserting 3 million rows, they both finish in 39 seconds. so that is now 6 million rows in 39 seconds.

Okay, that still leaves us with the NETWORK_IO wait showing up.

The NETWORK_IO waits are added by the fact that you are using extended events to trace them. In my case the insert takes 36 seconds (on avg). When using the extended event way (from the link above in the very first comment) this is what is registered:

Wait Type             Wait Count  Total Wait Time (ms) Total Resource Wait Time (ms) Total Signal Wait Time (ms)
NETWORK_IO            3455        68808                68802                         6
PAGEIOLATCH_SH        3           64                   64                            0
PAGEIOLATCH_UP        12          58                   58                            0
WRITE_COMPLETION      8           15                   15                            0
WRITELOG              3           9                    9                             0
PAGELATCH_UP          2           4                    4                             0
SOS_SCHEDULER_YIELD   32277       1                    0                             1
IO_COMPLETION         8           0                    0                             0
LATCH_SH              3           0                    0                             0
LOGBUFFER             1           0                    0                             0

You can see that 68 seconds of NETWORK_IO is registered. But since the insert loop is a single threaded action that took 36 seconds, this can't be. (Yes, multiple threads are used, but the operations are serial, never in parallel, so you can't acummulate more wait time than the total duration of the query)

If I don't use extended events but just the wait stats DMVs on a quiet instance (with just me running the insert) I get this:

Wait Type                   Wait Count  Total Wait Time (ms)  Total Resource Wait Time (ms) Signal Resource Wait Time (ms)
SOS_SCHEDULER_YIELD             8873                 0.21                                    0.01                                    0.20
PAGEIOLATCH_UP                  3                    0.02                                    0.02                                    0.00
PREEMPTIVE_OS_AUTHENTICATIONOPS 17                   0.02                                    0.02                                    0.00
PAGEIOLATCH_SH                  1                    0.00                                    0.00                                    0.00

So the NETWORK_IO you were seeing in the extended events log, wasn't related to your insert loop. (If you wouldn't turn nocount on, you would have massive async network IO waits, +1 Martin)

However I don't know why the NETWORK_IO show up in the extended event trace. Sure the writing out to a async file target of the events accumulates ASYNC_NETWORK_IO, but surely this is all done on a differenent SPID then the one we are filtering on. I might ask this as a new question myself)