SQL Server – TSQL vs SSIS for Direct Insert Speed Performance

performanceperformance-tuningsql serversql-server-2017ssis

I am moving data from Table A into Table B. Tables have the exact same DDL, table structure, and are straight inserts. We are Not transforming or modifying data.

If TSQL Server and SSIS have the same hardware specs for both cpu core and memory, and the two tables reside in a Same database, which tool is better performance wise? Table is 50 million rows, around 30GB.

Our performance testing shows T-sql; I would like the confirm what Microsoft or vendor recommends.

Purpose of question is to understand speed difference between Tsql and ssis.

Best Answer

To briefly quote the documentation:

Microsoft Integration Services is a platform for building enterprise-level data integration and data transformations solutions. Use Integration Services to solve complex business problems by copying or downloading files, loading data warehouses, cleansing and mining data, and managing SQL Server objects and data.

Integration Services can extract and transform data from a wide variety of sources such as XML data files, flat files, and relational data sources, and then load the data into one or more destinations.

You say that you're loading data from one table to another in the same database without transformations. SSIS isn't really the right tool for the job. With a reasonable table structure, properly written T-SQL inserts will perform better than loading through SSIS because you avoid the overhead of sending data to another server, passing it through buffers, and sending it back to the destination server. Behind the scenes SSIS is just doing a BULK INSERT which isn't going to be faster than a T-SQL INSERT except in edge cases.

In the interest of giving a complete answer, I'll describe a few of the features of SSIS which can be helpful from a data loading performance perspective. Consider a table in SQL Server with a complex five column clustered index and many nonclustered indexes. Suppose you can't disable NCIs. You might get the best possible loading performance by splitting up the insert into batches because smaller sorts will be used to do NCI maintenance. It may be difficult to code an efficient batching solution in SQL Server with a five column clustered key. However, this is trivial to do in SSIS. SSIS can read the data in order and split the data up into batches for you just by changing a parameter. So in that case, you may get better performance with SSIS because the T-SQL code wasn't written to be as efficient as possible.

For another example, a BULK INSERT is able to take a BU lock on a target heap. BU locks are compatible with other BU locks. As a result, you can get multiple concurrent T-SQL inserts all loading minimally logged data into the same heap without blocking. That cannot be accomplished with T-SQL alone. The BU lock is not available in that context. SQL Server 2017 offers parallel inserts into heaps in T-SQL, so the benefit of a BU lock is diminished when loading data from a table with an identical structure. An example where a BU lock is better than anything in T-SQL would be loading from a table with a clustered index into a heap on SQL Server 2014 Standard Edition. In T-SQL you can only run a single insert at a time but in SSIS you could have many concurrent inserts running.

In summary, SSIS is almost certainly the wrong tool for the scenario that you described. You should expect to see better performance with properly written T-SQL code, with the exception of a few edge cases, because it avoids the overhead inherent to moving data between servers.