Sql-server – Query performance drastically depends on the database in use

execution-planperformancesql server 2014t-sql

I'm on SQL-Server 12.0.5203.0

There is a query covering multiple databases. I think It is not useful to post the actual query (as it is rather complex and confidential, just think of a query like

SELECT t1.Column1
      ,t2.Column2
      ,t3.Column3
FROM db1.schema.TableName1 t1 with (nolock)
JOIN db1.schema.TableName2 t2 with (nolock) ON SomeCriteria
JOIN db2.schema.TableName3 t3 with (nolock) ON SomeCriteria  
JOIN db3.schema.TableName4 t4 with (nolock) ON SomeCriteria
WHERE SomeCriteria  
  AND t4.ColumnA = (SELECT DISTINCT ColumnX
                      FROM db2.schema.TableName5 with (nolock) 
                      WHERE ColumnY = 2902)

Please: This is not my query… Do not discuss the usage of WITH(NOLOCK) or GROUP BY against DISTINCT, thx 😀

Users reported time-outs due to run durations of more than a minute. I could not reproduce this, as exactly the same query in exactly the same environment brought back exactly the same result in less than a second.

Then – by setting all possible configurations to the same values – I encountered something strange: The difference depends on the database in use.

This is reproducable: If there is USE master; it is 1 second, with USE db1; (or any other) it is horribly slow.

Some general observations

  • All used tables are fully qualified with database.schema.table and aliased
  • All tables are called with WITH(NOLOCK)
  • The profiler for the fast run shows CPU(30), Reads(20000), Duration(30)
  • slow (same SSMS, just two windows): CPU(11000), Reads(13M!!!), Duration(12000)
  • The execution plans are extremely different
    • slow: Starts with an index seek returning 2.6M rows
      Combines this with another index seek pushing this to 27M rows (estimnated 45!)
      Filters this down to 1626 rows, which is the count of the final result
    • fast: starting with tiny sets of some 100 rows, Never more than 8000 rows

My questions

  • What is going on here? Why is the database, where I'm coming from, so important?

Lessons learned and a solution

What I did not know: Query plans are stored with each database separately, hence the context database can be very important. I cannot really grasp the advantage of this concept… Why not better store the plans in a central place, one per action? But this is a different question…

Using a query against sys.dm_exec_cached_plans and dm_exec-query_plan() with query_plan.exist('//*:StmtSimple[contains(@*:StatementText[1],"Some specific part of the query")]')=1 I found a couple of stored plans. After removal everything was fast and fine.

But the next day the bad behavior was back.

The solution: After a thorough look into the best plan I re-organised all JOINs to this order of execution and use OPTION(FORCE ORDER). This seems to solve the issue.

Best Answer

At a high level, database context is one of the keys that make a cached plan unique (there are other things, like certain session settings, the query text obviously, etc).

Therefore, it is quite possible that the first time this query was run in one database, it compiled (and is now stuck with) a very different plan than when it was compiled for some other database.

There are many reasons why a plan could be different, depending what happened between the first compilation and any subsequent compilations, but now that you have two plans, it is irrelevant that you are issuing the USE commands in the same window. You're still going to use the plan that was compiled for that database. Quick repro:

USE AdventureWorks;
GO
SELECT TOP (1) Monkey = BusinessEntityID 
  FROM AdventureWorks2012.Person.Person;
GO
USE OtherDB;
GO
SELECT TOP (1) Monkey = BusinessEntityID 
  FROM AdventureWorks2012.Person.Person;
GO
SELECT * FROM sys.dm_exec_cached_plans AS p
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) AS t
WHERE t.[text] LIKE '%Monkey%';

You get three rows - ignore the first, because that is just the query I ran to verify (which is cached before execution, so adds itself to the resultset), and keep in mind if you run a zoo database or work at SurveyMonkey, you may have to filter further:

enter image description here

Clearly you can see we got a plan for AdventureWorks and a plan for OtherDB, even though this is the same instance of SQL Server. Again, I can't go back in history and tell you why or when you got two different plans, but it's obvious to me that this is a possible explanation. Easy for you to verify, of course.

So, one more time, please try issuing the query with OPTION (RECOMPILE) when you USE <slowdatabase>;, or otherwise evicting those specific plans from the cache and trying again:

DBCC FREEPROCCACHE(0x...); -- where 0x... is the plan_handle