Sql-server – INSERT performance degradation in SQL Server after a certain number of columns

insertperformancesql serversql server 2014

I’m working on an application which has a feature of export of tabular data into a specified database table. The app uses INSERT statements to export its data to a target database.

Insert is done through a batched INSERT statement with 100 rows per one SQL INSERT statement (for now I can’t use BULK INSERT or bcp).

I’ve noticed that export time raises disproportionately when the number of columns in the source data exceeds some number (that number is not a fixed one and depends on size of the values, number of rows in each INSERT and so on).

For example, export of 50 000 rows (500 INSERT statements with 100 rows in each) of random strings with 100 chars each and 100 rows per INSERT takes:

3 sec with 5 columns
6 sec with 10 columns
56 sec with 15 columns
77 sec with 20 columns

Note the difference in export time between 10 and 15 columns. I was expecting that export time of 15 columns will be 9-10 sec, but it’s actually 5 times longer. I was able to find similar performance degradation while testing export of other datasets.

In order to be sure that the problem is not on my side, I ran the sames set of INSERT statements through sqlcmd.exe. I got similar results.

Question: What can I do to make SQL Server to work with large number of columns as fast as with a small one? Or at least “move” the point of performance degradation to a larger number of columns?

Additional details:

INSERT queries were executed on a local SQL Server Express 2014 (64-bit), version 12.0.5000.0;
Database recovery model is set to Simple;
All the INSERT statements were wrapped in a single transaction (I tried to call COMMIT after each INSERT, but the results were pretty much the same);
Target table was created before each test. It was a simple table without any indexes, foreign keys, constraints, etc.;
Hard drive performance doesn’t seems to be the source of the problem because during the first two tests (with 5 and 10 columns) disk write speed of sqlservr.exe process was 10 times greater than during the last two cases.

Tables are created like this:

CREATE TABLE [Test_Table]
(
    [Column 1] VARCHAR(255),
    [Column 2] VARCHAR(255),
    [Column 3] VARCHAR(255),
    [Column 4] VARCHAR(255),
    [Column 5] VARCHAR(255)
)

The data looks like this (each cell actually contains 100 chars long string, all the string in the same row are equal):

+------------+------------+------------+------------+------------+
| [Column 1] | [Column 2] | [Column 3] | [Column 4] | [Column 5] |
+------------+------------+------------+------------+------------+
| R6YZ..uWaQ | R6YZ..uWaQ | R6YZ..uWaQ | R6YZ..uWaQ | R6YZ..uWaQ |
| DMNW..Kh0a | DMNW..Kh0a | DMNW..Kh0a | DMNW..Kh0a | DMNW..Kh0a |
| GKbg..yuap | GKbg..yuap | GKbg..yuap | GKbg..yuap | GKbg..yuap |
| pG+f..64bX | pG+f..64bX | pG+f..64bX | pG+f..64bX | pG+f..64bX |
| O2Q7..fTNF | O2Q7..fTNF | O2Q7..fTNF | O2Q7..fTNF | O2Q7..fTNF |

Here are two examples with reproduce the issue:

http://rextester.com/OZI56670 (10 columns, ~0,09 sec)
http://rextester.com/HLAP4972 (11 columns, ~0,45 sec)

Best Answer

The difference between the repro that you have posted with 10 columns and 100 rows and 11 columns and 100 rows is that the execution plan for the first one uses Simple Parameterization.

The actual execution plan for the 10 columns lists parameters from @1 to @1000.

11 * 100 is 1100. But one thousand seems to be the maximum number of parameters an auto parameterised query can reach.

You are doing 10 inserts for each. In the 10 column case the plan can be compiled once and reused for the other 9 inserts. In the 11 column case each insert statement needs to be compiled individually.

Moreover the process of compilation takes longer when SQL Server has the literal values to look at as it spends time working out properties of the group (or at least this used to be the case I'm not sure if this has changed in more recent versions).

Related Solutions

Sql-server – SQL Server 2008 R2 Dirty reads – how non-atomic

EDITED after reading the MSDN forum link from the comment, very interesting.

Regardless of isolation level, two users cannot update a single page simultaneously, nor can any user read a partially updated page. Just imagine how SQL Server would deal with a page where the header says Col3 starts at byte 17. But it really starts at byte 25, because that part of the row hasn't been updated yet. There's no way a database could handle that.

But for rows larger than 8k, multiple pages are used, and that makes a half-updated column possible. Copied from the MSDN link (in case the link breaks), start this query in one window:

if object_id('TestTable') is not null
    drop table TestTable
create table TestTable (txt nvarchar(max) not null)
go
insert into TestTable select replicate(convert(varchar(max),
    char(65+abs(checksum(newid()))%26)),100000)
go 10
update TestTable set txt=replicate(convert(varchar(max),
    char(65+abs(checksum(newid()))%26)),100000)
go 100000

This creates a table and then updates it with a string of 100.000x the same character. While the first query is running, start this query in another window:

while 1=1 begin
 if exists (select * from TestTable (nolock) where left(Txt,1) <> right(Txt,1))
    break
end

The second query stops when it reads a column that is half updated. That is, when the first character is different from the last. It will finish quickly, proving that it is possible to read half-updated columns. If you remove the nolock hint, the second query will never finish.

Surprising result! A half-updated XML column might break a (nolock) report, because the XML would be malformed.

Mysql: Insert performance INNODB vs MYISAM

The reason is very simple. When you insert a row into MyISAM, it just puts it into the server's memory and hopes that the server will flush it to disk at some point in the future. Good luck if the server crashes.

When you insert a row into InnoDB it syncs the transaction durably to disk, and that requires it to wait for the disk to spin. Do the math on your system and see how long that takes.

You can improve this by relaxing innodb_flush_log_at_trx_commit or by batching rows within a transaction instead of doing one transaction per row.

I highly recommend reading High Performance MySQL 3rd Edition (I am the author).

Best Answer

Related Solutions

Sql-server – SQL Server 2008 R2 Dirty reads – how non-atomic

Mysql: Insert performance INNODB vs MYISAM

Related Question