Am I correct in saying that statistics are only used when creating the execution plan for a stored procedure, and they are not used in the actual execution context?
No, what happens is that the execution plan for a stored procedure is cached. Assuming there is enough available memory to continue holding the plan, it won't change unless one of the following happens (from Execution Plan Caching and Reuse in the SQL Server documentation, emphasis added):
- Changes made to a table or view referenced by the query (ALTER TABLE and ALTER VIEW).
- Changes made to a single procedure, which would drop all plans for that procedure from the cache (ALTER PROCEDURE).
- Changes to any indexes used by the execution plan.
- Updates on statistics used by the execution plan, generated either explicitly from a statement, such as UPDATE STATISTICS, or generated automatically.
- Dropping an index used by the execution plan.
- An explicit call to sp_recompile.
- Large numbers of changes to keys (generated by INSERT or DELETE statements from other users that modify a table referenced by the query).
- For tables with triggers, if the number of rows in the inserted or deleted tables grows significantly.
- Executing a stored procedure using the WITH RECOMPILE option.
So if the statistics are updated, the cached plan will automatically take the new stats into account and be recompiled.
How do you prevent execution plans from growing stale when you have a hundred thousand rows being added a day?
One way is if there are a lot of updates to the table, as mentioned above. A few hundred thousand changed rows may satisfy this condition. But if you want to be sure or have more granular control: by updating your stats. You can allow SQL Server to auto create and manage statistics, or manually do it yourself. You can find more info on either method at SQL Server Auto Update and Auto Create Statistics Options. When/if you do a weekly rebuild of indexes, this will also trigger the plans to be updated as well. Do some testing to see what is most beneficial to you, as updating statistics too often may not yield any real performance results.
If we're updating statistics frequently to combat this issue, would it make sense to use the OPTION (RECOMPILE) hint on this stored procedure's query?
You don't need to use RECOMPILE
, since based off the excerpt above you can see that the execution plan gets updated appropriately whenever new stats are available. You may be fine with an end of day statistics update (if you're truly concerned) but I don't think it's explicitly a need based on what you've said so far. Again, though, I'd test it to see what impact this may have on your stored procedure performance and plan accordingly.
What is up with FROM part JOIN model ON 1=1
? This the same as FROM part, model
, which is a cartesian join and will result in a very large number of rows. Is that join supposed to be like that?
You will likely help us help you if you provide details about the tables involved. Please "script" the definition of the tables, along with any indexes defined on those tables.
This sounds like a classic case of parameter sniffing resulting in good plan/bad plan choices for various scenarios in your data.
You may be able to get more reliable performance by making SQL Server cache different plans for different scenarios by using sp_executesql
, as in the following example:
CREATE PROCEDURE [dbo].[create_grid_materials2]
(
@partlistid bigint
, @pid bigint
, @masterid bigint
)
AS
BEGIN
begin
DECLARE @cmd NVARCHAR(MAX);
SET @cmd = '
INSERT INTO material (partid, personid, modelID)
SELECT
partid = part.id
, personid = @pid
, modelid = model.id
FROM part
INNER JOIN model ON 1=1
WHERE (
model.masterid = ' + CONVERT(NVARCHAR(50), @masterid) + '
AND model.modelSetID IS NULL
AND part.partlistid = ' + CONVERT(NVARCHAR(50), @partlistid) + '
AND (
part.partType = 100
or part.partType=120
or part.partType = 130
)
)
AND NOT EXISTS (
SELECT 1
FROM material AS a1
WHERE a1.partid = part.id
AND a1.personid=@pid
AND a1.modelid=model.id
)';
DECLARE @Params VARCHAR(200);
SET @Params = '@pid INT';
EXEC sys.sp_executesql @cmd
, @Params
, @pid = @pid;
end
End
The above code will cause a new plan to be generated for each combination of @partlistid
, and @masterid
.
The presumption here is some combinations of those two variables lead to a very small number of rows, whereas some combinations lead to a very large number of rows.
Forcing a plan for each combination allows SQL Server to generate more efficient plans for each. I've explicitly not included @pid
since you probably want to try it with a fairly small number of combinations first; adding a third variable to the mix will make for an exponentially larger number of possible plans.
Best Answer
Michael Green is right: the devs are trying to thwart parameter sniffing, which happens when SQL Server compiles a plan that is great for one set of parameter values, but horrible for others.
You'll want to use
OPTION (RECOMPILE)
on the statement(s) with issues, notWITH RECOMPILE
on the procedure. And I wouldn't recommend the local variables "trick" - it just makes the code messier; better to useOPTIMIZE FOR UNKNOWN
on modern versions if that's the method that works best in your scenario. (For a whole lot more on this topic, see this great post by Paul White.)Also, if many parameters are optional (so the query has things like
WHERE col = @param or @param IS NULL
), this is what I call "the kitchen sink" - sometimes dynamic SQL can be a much more effective solution. You didn't show the rest of your code, only that you were already using the local variables trick, but it will essentially look like this:This approach of only adding clauses for parameters that are actually supplied protects you from caching plans based on different sets of parameters (for example, if I supply
@FirstName
on first execution, the seek plan on that column that gets cached isn't going to help when I ask for@LastName LIKE N'%s%'
). TheOPTION (RECOMPILE);
at the end protects you from plans that can vary greatly based on the values of the same parameters from execution to execution (for example,WHERE name LIKE N'%s%'
should yield a different plan shape thanWHERE name LIKE N'Q%'
).This typically works best with the server setting
optimize for ad hoc workloads
, which you can read about here and here. Essentially what this does is prevents your plan cache from filling up with all these slight plan variations, unless they are used more than once. (Yes, withOPTION (RECOMPILE)
, the point is moot; however, the server setting can't hurt for the rest of your ad hoc query workload, and I've never come across a downside to having it on.)This is pretty safe from SQL injection, since you don't have to worry about concatenating user input into SQL strings (all parameters are strongly typed), but it can't hurt to read these topics on dynamic SQL:
I have videos about my solution to "the kitchen sink" here and here as well as a blog post about it.