I’m working on an application which has a feature of export of tabular data into a specified database table. The app uses INSERT
statements to export its data to a target database.
Insert is done through a batched INSERT
statement with 100 rows per one SQL INSERT
statement (for now I can’t use BULK INSERT
or bcp
).
I’ve noticed that export time raises disproportionately when the number of columns in the source data exceeds some number (that number is not a fixed one and depends on size of the values, number of rows in each INSERT
and so on).
For example, export of 50 000 rows (500 INSERT
statements with 100 rows in each) of random strings with 100 chars each and 100 rows per INSERT
takes:
3 sec with 5 columns 6 sec with 10 columns 56 sec with 15 columns 77 sec with 20 columns
Note the difference in export time between 10 and 15 columns. I was expecting that export time of 15 columns will be 9-10 sec, but it’s actually 5 times longer. I was able to find similar performance degradation while testing export of other datasets.
In order to be sure that the problem is not on my side, I ran the sames set of INSERT
statements through sqlcmd.exe
. I got similar results.
Question: What can I do to make SQL Server to work with large number of columns as fast as with a small one? Or at least “move” the point of performance degradation to a larger number of columns?
Additional details:
INSERT
queries were executed on a local SQL Server Express 2014 (64-bit), version 12.0.5000.0;- Database recovery model is set to Simple;
- All the
INSERT
statements were wrapped in a single transaction (I tried to callCOMMIT
after eachINSERT
, but the results were pretty much the same); - Target table was created before each test. It was a simple table without any indexes, foreign keys, constraints, etc.;
- Hard drive performance doesn’t seems to be the source of the problem because during the first two tests (with 5 and 10 columns) disk write speed of
sqlservr.exe
process was 10 times greater than during the last two cases.
Tables are created like this:
CREATE TABLE [Test_Table]
(
[Column 1] VARCHAR(255),
[Column 2] VARCHAR(255),
[Column 3] VARCHAR(255),
[Column 4] VARCHAR(255),
[Column 5] VARCHAR(255)
)
The data looks like this (each cell actually contains 100 chars long string, all the string in the same row are equal):
+------------+------------+------------+------------+------------+ | [Column 1] | [Column 2] | [Column 3] | [Column 4] | [Column 5] | +------------+------------+------------+------------+------------+ | R6YZ..uWaQ | R6YZ..uWaQ | R6YZ..uWaQ | R6YZ..uWaQ | R6YZ..uWaQ | | DMNW..Kh0a | DMNW..Kh0a | DMNW..Kh0a | DMNW..Kh0a | DMNW..Kh0a | | GKbg..yuap | GKbg..yuap | GKbg..yuap | GKbg..yuap | GKbg..yuap | | pG+f..64bX | pG+f..64bX | pG+f..64bX | pG+f..64bX | pG+f..64bX | | O2Q7..fTNF | O2Q7..fTNF | O2Q7..fTNF | O2Q7..fTNF | O2Q7..fTNF |
Here are two examples with reproduce the issue:
http://rextester.com/OZI56670 (10 columns, ~0,09 sec)
http://rextester.com/HLAP4972 (11 columns, ~0,45 sec)
Best Answer
The difference between the repro that you have posted with 10 columns and 100 rows and 11 columns and 100 rows is that the execution plan for the first one uses Simple Parameterization.
The actual execution plan for the 10 columns lists parameters from
@1
to@1000
.11 * 100
is1100
. But one thousand seems to be the maximum number of parameters an auto parameterised query can reach.You are doing 10 inserts for each. In the 10 column case the plan can be compiled once and reused for the other 9 inserts. In the 11 column case each insert statement needs to be compiled individually.
Moreover the process of compilation takes longer when SQL Server has the literal values to look at as it spends time working out properties of the group (or at least this used to be the case I'm not sure if this has changed in more recent versions).