SQL Server – Understanding Script Execution Speed Improvement When Split

performancesql serversql-server-2008-r2

I have an ETL job that merges data into about 20 columns of a fact table, all integer non-null (default 0) columns. There are 8 merge statements to completely update all the columns and they involve quite a lot of calculation and rows. I put all the merge statements into one long stored procedure and ran it, but it took an impossibly long time. I made some covering indexes and some other changes, which made barely any difference; I was still manually stopping the job after over an hour of running.

Acting on general intuition, I split the long script into seven separate stored procedures and called them one by one with an over-arching stored procedure. The whole lot consistently finishes within 3 minutes (no code changes were made to the scripts).

My question is this – is there a general set of conditions in which it is always so much significantly faster to split a script up into smaller steps – is there some feature of the scripts / server memory or other factor that will mean splitting is occasionally so much more effective than at other times?

Alternatively, is it just that it is always better to split scripts up like this (assuming no problems with naming / SP proliferation etc), and the huge difference in speed is nothing unusual?

My original script was large and would be hard to sanitise sufficiently to show here, but I can give some details: It is about 550 lines, there were many derived tables, CTEs, and GROUP BY clauses, with a couple of relatively complex WHERE clauses and 1 temp table. There were 8 Merge statements, all targeting the fact table, but sourcing from 3 databases on the same server as the target. Maybe there is something in this which would have particularly slowed things down.

Best Answer

Your comment confirms that the original long script was using variables and these have become parameters.

Probably you are benefitting from parameter sniffing here and better execution plans. The values for variables are not sniffed unless you use the ’option recompile' query hint so you will get the same plan regardless of what the runtime values are.

When you parameterize it SQL Server sniffs the parameter value at the time the plan for the statement is actually compiled and can look in the column statistics using these actual values. Based on that you can get hopefully better row estimates and a more suitable plan.

As a side note parameter sniffing can have its own problems if the plan is cached and the procedure is executed again with a different parameter value as the original plan might not be optimal for the new parameter value.