Sql-server – Sql select slow on filter for key, but not for primary key, how can I hint the priority

sql server

I have 2 select, the first is superfast, and the second is 10x more slower. It seems to be first join everything from the table, than running the startdate filter, and the first case it is filtering for the id first and only than running the select.
And yes got it, first he is running the subselect, sees that it is only 400 record, and doing the rest. So how can I hint that without this workaround to run that start date filter first?

Tourneys table: 1M records (has index on StartDate)
TourneyPlayers table: 2M records (has index on TourneyId ofc)
Tourneys after the start date filter: 400 records

fast:

select *
from Tourneys t right join TourneyPlayers tp on t.Id = tp.TourneyId 
where 1=1 
  and t.id in (select id from Tourneys where StartDate > DATEADD(hour,-1,GETDATE()))

https://www.brentozar.com/pastetheplan/?id=S1t8EDQ4V

slow:

select * 
from Tourneys t, TourneyPlayers tp
where 1=1
  and t.StartDate > DATEADD(hour,-1,GETDATE())
  and t.Id = tp.TourneyId

https://www.brentozar.com/pastetheplan/?id=rJKzNvmEV

Best Answer

Your two queries are fundamentally different. The fast query is right-joining the TourneyPlayers table to Tourney, and then filtering. Based on the order of operations, it is likely seeking records on Tourney first, then seeking records on TourneyPlayers that match the TourneyId using the correct index.

In the slow query, you're inner joining the tables, so the engine is scanning the TourneyPlayers clustered index and seeking the Tourney records using the date field index. It then has to combine these record sets to produce the final result set.

You need to determine which join you require for the relationship between your tables and the result set you need, then from there you can construct your query correctly and assess your indexing strategy for these tables.

For example, do you want to return every Tourney record regardless of whether there is an associated TourneyPlayer record? Do you want every TourneyPlayer record, regardless of whether there is an associated Tourney record? Do you want only those Tourney records that have an associated TourneyPlayer record?

Once you've answered this question, you will have the answer for your join and can proceed from there with query design and index optimisation.

Related Solutions

SQL Server – Find Max Value for Each Month for Last 3 Months

For SQL Server, you could do something like this.

DECLARE @t TABLE (Id INT, DateVal DATETIME, ValueVal DECIMAL(18, 9));

INSERT @t ( Id, DateVal, ValueVal )
SELECT Id, DateVal, ValueVal
FROM (
    VALUES 
    (12410,   '01/03/2017 12:17',    0.000178),
    (12410,   '01/10/2017 11:36',    0.000186),
    (12410,   '01/17/2017 11:27',    0.000189),
    (12410,   '01/24/2017 13:09',    0.000182),
    (12410,   '01/31/2017 10:37',    0.000169),
    (12410,   '02/07/2017 11:03',    0.000214),
    (12410,   '02/14/2017 11:52',    0.000176),
    (12410,   '02/21/2017 10:51',    0.000200),
    (12410,   '02/28/2017 12:29',    0.000194),
    (12410,   '03/07/2017 08:39',    0.000206)
) x (Id, DateVal, ValueVal);

SELECT  DATEPART(MONTH, t.DateVal) AS [DateVal],
        MAX(t.ValueVal) AS MaxVal
FROM @t AS t
WHERE t.DateVal >= DATEADD(MONTH, -3, DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0))
GROUP BY DATEPART(MONTH, t.DateVal);

Updated WHERE clause: Flattening dates is weird, and my previous query got you today's date minus three months. To get back to the first of three months ago, you have to turn some tricks.

SELECT DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0) AS [First Of This Month],
       DATEADD(MONTH, -3, DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0)) AS [First Of Three Months Ago]

You can either take some time trying to understand this, or keep a cheat sheet of how to do it handy so you don't have to remember awful date math ;)

SQL Server – How to Optimize Long Running Query

It looks like for your final result set you want one row for each distinct value of clientname in @NumOfLeads and @PurchasedLicenses. If the client is in both tables then populate the numoflicensesrequested and numlicensespurchased columns. If not then populate the one column that you have. To me, that business logic sounds straightforward enough that you don't need to split it up into multiple queries. Sample data:

CREATE TABLE #NumOfLeads
(
  clientname varchar(500)
, numoflicensesrequested int
, calldate date
, PRIMARY KEY (clientname)
);

CREATE TABLE #PurchasedLicenses
(
  clientname varchar(500)
, numoflicensespurchased int
, purchasedate date
, PRIMARY KEY (clientname)
)

Insert Into #NumOfLeads (clientname, numoflicensesrequested, calldate) Values
('Client A', 10, '2017-01-01'), ('Client B', 5, '2017-01-02')
,('Client C', 7, '2017-01-01'), ('Client D', 12, '2017-01-03');

Insert Into #PurchasedLicenses (clientname, numoflicensespurchased, purchasedate) Values
('Client A', 10, '2017-01-10'), ('Client C', 5, '2017-01-15'), ('Client E', 10, '2017-01-15')
,('Client F', 11, '2017-01-15'), ('Client G', 5, '2017-01-22');

This query returns the same results as yours:

SELECT
  nol.clientname
, nol.numoflicensesrequested
, COALESCE(pl.numoflicensespurchased, 0) AS numoflicensespurchased
FROM #NumOfLeads nol
LEFT OUTER JOIN #PurchasedLicenses pl ON nol.clientname = pl.clientname
    AND pl.purchasedate BETWEEN '2017-01-01' AND '2017-01-31'
WHERE nol.calldate BETWEEN '2017-01-01' AND '2017-01-31'

UNION ALL

SELECT 
  pl.clientname
, 0
, pl.numoflicensespurchased
FROM #PurchasedLicenses pl
WHERE pl.purchasedate BETWEEN '2017-01-01' AND '2017-01-31'
AND NOT EXISTS (
    SELECT 1
    FROM #NumOfLeads nol
    WHERE nol.clientname = pl.clientname
)
order by clientname asc;

I believe that the query could also be written with a FULL OUTER JOIN, but I'm not personally a fan of that syntax.

A few things worth mentioning:

Don't use table variables unless you need the functionality that they provide. Table variables don't have statistics so they can cause poor query performance if you use them in the wrong way. Personally, I only consider table variables if I need to persist data through a transaction rollback or if I have code that will run thousands of times per second (or more).

The fact that you're updating by clientname implies that it's the primary key of the data. Why not define it as the primary key?

For complex queries you can see performance gains by splitting them up into multiple steps by saving intermediate results to temp tables. I would only try that if I couldn't get good enough performance with a single query. For this data and business logic I think that you can just write a single SELECT query. For other queries which are too complex, try doing inserts into temp tables. That will give you minimal logging and other benefits. Avoid doing updates (when practical), and avoid using MERGE unless you really need its functionality.

Best Answer

Related Solutions

SQL Server – Find Max Value for Each Month for Last 3 Months

SQL Server – How to Optimize Long Running Query

Related Question