I've just fixed a production performance issue by dropping an index and recreate it.
I suspect dropping the index also dropped executions plans that used it and one of them happen to be bad.
Arguments in favor of bad execution plan :
- Before dropping the index, I looked up the last update date for the statistics on the given table and they were up to date.
- My DBA has put in place Hallengren's index and statistic maintenance solution
- The slow query was a select statement executed from
sp_executesql
with dates parameters. Executing the same select statement withoutsp_executesql
was fast, but also didn't use the same execution plan.
Arguments against bad execution plan :
- Before dropping the index, we went real wild and ran the forbidden
dbcc freeproccache
to clear any bad plan, but this didn't fix or change the performance issue.
Note:
-
The slow query happen to be using a table indexed by date. However,
there is wide differences in the amount of records for each date. In
other word, any given date range from few records to more than 100k
and it is pretty random. -
The database is running under compatibility level 140 (SQL Server
2017)
Was the source of the problem a bad plan or a stale index?
If it is a bad plan, then why dbcc freeproccache
didn't work in getting rid of it?
Edit :
From what I know, this looked very much like a bad plan, but somehow dbcc freeproccache
didn't work. So, I'm left in the dark, confused about the whole situation.
Best Answer
Here's a timeline of the events you described, as well as my commentary on what was the likely cause of your bad plan.
Bad times, I empathize!
Usually, clearing the cache fixes problems like this if the problem is parameter sniffing.
Another reason for bad plans is bad stats. If the problem is bad stats, then clearing the procedure cache will just result in the same bad execution plan being produced again (based on the bad stats).
Creating an index on a column creates new statistics on that column
WITH FULLSCAN
:These are usually the best stats, because they include the most comprehensive view of the data.
Statistics can be automatically updated over time, as rows in the table are changed / added / deleted. SQL Server will use different "sampling percents" based on the size of the table. From that same docs page:
It sounds like what happened was:
You mentioned you're using Ola's index and stats maintenance solution. You may want to add a special case for this table / index to update stats with
FULLSCAN
, as it sounds like the table's data is skewed enough to warrant that.If you want to avoid scanning the whole table (I don't know how big it is), you can experiment with different sample percentages (starting at 100, and dropping by 5 or 10 percent until you get the bad plan).
Looking the execution plans you provided (thanks!), you can see evidence for my theory. Here's the index for the bad plan:
And here's the good plan:
The bad plan has a sample percent of
23.4265
, while the good plan had stats sampled at 100% (because of the index recreation).