Sql-server – Partition Switch – What is the benefit if staging has to be same as destination table

partitioningsql server

I am exploring to load large datasets into destination tables using SSIS.
I came across "Partition Switching" as one of methods to load data faster into destination tables.

But since the staging table has to be of the almost same configuration as destination table partition in order for partition switching to work, what is the real benefit here?

My concern is loading of the data after the cleanup/transformations. If staging table is exactly same as destination partition, loading time should be same for both. Or is it that loading data into a huge partitioned clustered indexed table will take more time versus loading into exactly same empty un-partitioned clustered indexed table?

We still have to fill up the staging table (which is almost same as destination), it's going to take the same time as to load data directly in destination partition, right? Why not load directly into destination then?

What is real benefit in terms of time to load the data here?

Best Answer

Speed is the name of the game here. You can load large amounts of data into a staging table then switch it in, this reduces the amount of contention for locks/resource on the primary table (though note when you do switch back in, though very fast it does require a lock on the main table). It allows you to manipulate and sanitize your data whilst in staging so you're not hitting the main table unnecessarily post import. The initial load from source can be quicker too, so if that is a bottle neck for you, it could be a benefit.

Just as a rough example of time I used to work with a large database where flat files were loaded into a DB daily, about a million + rows in a load. Inserting directly into the table in batches took around 4 hours. With a table partitioning strategy loading to stating and switching in took 1 hr 30 mins. And only < 1 min of that was the switch that hit the primary table.

Every situation is different, experiment and play around, see if its worth it, even if its not its a good bit of experience.

Or is it that loading data into a huge partitioned clustered indexed table will take more time versus loading into exactly same empty un-partitioned clustered indexed table?

Yes and no, its to do with the indexes and the amount of data already in the table if your indexes are not partition aligned (partitioned indexes) they should be to get full benefit of partitions anyway, without them inserts would take a massive hit, also data being accessed from the primary table, if the queries do not eliminate partitions that's going to cause issue while inserting too. another benefit of using staging is if the load fails you just have to drop the staging table, not clean up the primary.