I am mainly a .NET developer using Entity Framework ORM. However, because I don't want to fail in using the ORM, I am trying to understand what happens within the data layer (database). Basically, during the development I start the profiler and check what some parts of code generate in terms of queries.
If I spot something utterly complicated (ORM can generate awful queries even from rather simple LINQ statements, if not carefully written) and/or heavy (duration, CPU, page reads), I take it in SSMS and check its execution plan.
It works fine for my level of database knowledge. However, BULK INSERT seems to be a special creature, as it does not seem to produce a SHOWPLAN.
I will try to illustrate a very simple example:
Table definition
CREATE TABLE dbo.ImportingSystemFileLoadInfo
(
ImportingSystemFileLoadInfoId INT NOT NULL IDENTITY(1, 1) CONSTRAINT PK_ImportingSystemFileLoadInfo PRIMARY KEY CLUSTERED,
EnvironmentId INT NOT NULL CONSTRAINT FK_ImportingSystemFileLoadInfo REFERENCES dbo.Environment,
ImportingSystemId INT NOT NULL CONSTRAINT FK_ImportingSystemFileLoadInfo_ImportingSystem REFERENCES dbo.ImportingSystem,
FileName NVARCHAR(64) NOT NULL,
FileImportTime DATETIME2 NOT NULL,
CONSTRAINT UQ_ImportingSystemImportInfo_EnvXIs_TableName UNIQUE (EnvironmentId, ImportingSystemId, FileName, FileImportTime)
)
Note: no other indexes are defined on the table
The bulk insert
(what I catch in profiler, one batch only)
insert bulk [dbo].[ImportingSystemFileLoadInfo] ([EnvironmentId] Int, [ImportingSystemId] Int, [FileName] NVarChar(64) COLLATE Latin1_General_CI_AS, [FileImportTime] DateTime2(7))
Metrics
- 695 items inserted
- CPU = 31
- Reads = 4271
- Writes = 24
- Duration = 154
- Total table count = 11500
For my application, that's ok, although the reads seems rather large (I know very little about SQL Server internals, so I comparing to the 8K page size and the small record information I have)
Question: how can I investigate if this BULK INSERT can be optimized? Or it does not make any sense, since it is arguably the fastest way to push large data from a client application to SQL Server?
Best Answer
As far as I can tell you can optimize a bulk insert in a very similar way that you'd optimize a regular insert. Typically, a query plan for a simple insert isn't very informative so don't worry about not having the plan. I'll go over a few ways of optimizing an insert but most of them probably don't apply for the insert you specified in the question. However, they could be helpful if in the future you need to load larger amounts of data.
1. Insert data in clustering key order
SQL Server will often sort data before inserting it into a table with a clustered index. For some tables and applications you can improve performance by sorting the data in the flat file and letting SQL Server know that the data is sorted through the
ORDER
argument ofBULK INSERT
:Since you are using an
IDENTITY
column as the clustered key you don't need to worry about this.2. Use
TABLOCK
if possibleIf you are guaranteed to have only one session inserting data into your table you can specify the
TABLOCK
argument forBULK INSERT
. This can reduce lock contention and can lead to minimal logging in some scenarios. However, you are inserting into a table with a clustered index that already contains data so you will not get minimal logging without trace flag 610 which is mentioned later in this answer.If
TABLOCK
is not possible, because you can't change the code, not all hope is lost. Consider usingsp_table_option
:Another option is to enable trace flag 715.
3. Use an appropriate batch size
Sometimes you will be able to tune inserts by changing the batch size.
Here is the quote from later in the article:
Personally I would just insert all 695 rows in a single batch. Tuning the batch size can make a big difference when inserting lots of data though.
4. Make sure that you need the
IDENTITY
columnI don't know anything about your data model or requirements, but don't fall into the trap of adding an
IDENTITY
column to every table. Aaron Bertrand has an article about this called Bad habits to kick : putting an IDENTITY column on every table. To be clear, I'm not saying that you should remove theIDENTITY
column from this table. However, if you determine that theIDENTITY
column is not necessary and remove it that could improve insert performance.5. Disable indexes or constraints
If you're loading a large amount of data into a table compared to what you already have then it may be faster to disable indexes or constraints before the load and to enable them after the load. For large amounts of data it is usually more inefficient for SQL Server to build an index all at once instead of as data is loaded into the table. It looks like you inserted 695 rows into a table with 11500 rows, so I would not recommend this technique.
6. Consider TF 610
Trace Flag 610 allows minimal logging in some additional scenarios. For your table with an
IDENTITY
clustered key, you would get minimal logging for any new data pages as long as your recovery model is simple or bulk-logged. I believe this feature is not on by default because it may degrade performance on some systems. You would need to test carefully before enabling this trace flag. The recommended Microsoft reference still appears to be The Data Loading Performance GuideAs far as I can tell this doesn't have anything to do with trace flag 610, but rather with minimal logging itself. I believe the earlier quote about
ROWS_PER_BATCH
tuning was getting at this same concept.In conclusion, there probably isn't much that you can do to tune your
BULK INSERT
. I wouldn't be concerned about the read count that you observed with your insert. SQL Server will report reads any time you insert data. Consider the following very simpleINSERT
:Output from
SET STATISTICS IO, TIME ON
:I have 11428 reported reads but that is not actionable information. Sometimes the number of reported reads can be reduced by minimal logging, but of course the difference cannot be directly translated into a performance gain.