SSIS Data Flow data access mode – what’s the point of ‘table or view’ vs fast load

ssis

Using the SQL Server Business Intelligence Development Studio, I do a lot of flat file to OLE DB destination data flows to import data to my SQL Server tables. Under "Data access mode" in in the OLE DB destination editor, it defaults to "table or view" rather than "table or view – fast load". What is the difference; the only discernible difference I can perceive is that the fast load transfers the data much faster.

Best Answer

The OLE DB Destination Component's Data Access Modes comes in two flavors - fast and non-fast.

Fast, either "table or view - fast load" or "table or view name variable - fast load" means that data will be loaded in a set-based fashion.

Slow - either the "table or view" or "table or view name variable" will result in SSIS issuing singleton insert statements to the database. If you're loading 10, 100, maybe even 10000 rows, there's probably little appreciable performance difference between the two methods. However, at some point you're going to saturate your SQL Server instance with all these piddly little requests. Additionally, you're going to abuse the heck out of your transaction log.

Why would you ever want the non-fast methods? Bad data. If I sent in 10000 rows of data and the 9999th row had a date of 2015-02-29, you would have 10k atomic inserts and commits/rollbacks. If I was using the Fast method, that entire batch of 10k rows will either all save or none of them. And if you want to know which row(s) errored out, the lowest level of granularity you will have is 10k rows.

Now, there are approaches to getting as much data loaded as fast as possible and still handle dirty data. It's a cascading failure approach and it looks something like

The idea is that you find the right size to insert as much as possible in one shot but if you get bad data, you're going to try resaving the data in successively smaller batches to get to the bad rows. Here I started with a Maximum insert commit size (FastLoadMaxInsertCommit) of 10000. On the Error Row disposition, I change it to Redirect Row from Fail Component.

The next destination is the same as above but here I attempt a fast load and save it in batches of 100 rows. Again, test or make some pretense of coming up with a reasonable size. This will result in 100 batches of 100 rows sent because we know somewhere in there, there is at least one row that violated the integrity constraints for the table.

I then add a third component to the mix, this time I save in batches of 1. Or you can just change the table access mode away from the Fast Load version because it'll yield the same result. We will save each row individually and that will enable us to do "something" with the single bad row(s).

Finally, I have a failsafe destination. Maybe it's the "same" table as the intended destination but all the columns are declared as nvarchar(4000) NULL. Whatever ends up at that table needs to be researched and cleaned/discarded or whatever your bad data resolution process is. Others dump to a flat file but really, whatever makes sense for how you want to track bad data works.

Best Answer

Related Solutions

ETL – Extracting from 200 Tables: SSIS Data Flow vs Custom T-SQL

SSIS Design considerations

TSQL vs SSIS (my story)

Sql-server – SSIS package with BULK data load

Related Question