Sql-server – Why are those estimates way off

execution-plansql serversql-server-2016

I have problems understanding the following query plan: https://www.brentozar.com/pastetheplan/?id=SyoD_PosP

First there is the bottom right index seek. The estimation says over 22 million rows but the actual data is less than 2 million rows. My guess is, the estimation is based on just the rows matching the Tag column and a look at the index's histogram seems to corroborate that. Am I right in thinking that no assumption is made about the result of matching those rows to the predicate of the join and that this is what results in the differnce between the estimation and the actual result?

Second and more importantly, the Hash Match estimates that it will return 140k rows when in actuality it is about twelve times that. This leads to the sort operator spilling into tempdb a lot, if I understand what is happening correctly. Why is the estimate so low? And is there any way to improve it?

Update: Coincidentally, the values of Record.Tag are exactly [10399, 10438]. In principle there is no guarantee that it is a continous interval, tho. I tried J.D.'s suggestion and split the query into four queries and UNION ALL'ed them together. No improvement there.
Next I tried filling a temporary table with the Record rows of those Tags:

SELECT * INTO #tmprecord
FROM Record 
WHERE (Record.Tag>=10399 AND Record.Tag<=10438) 

SELECT DateAndTime, r.Reading, r.Tag 
FROM RecordSet 
INNER JOIN #tmprecord AS r 
ON RecordSet.RecordSetId = r.RecordSet 
WHERE RecordSet.Collector=10090
AND RecordSet.DateAndTime >= '20201101 00:00:00' 
AND RecordSet.DateAndTime <= '20210101 00:00:00'
ORDER BY r.RecordId

DROP TABLE #tmprecord

The estimation here seems better (four times instead of twelve times), but the spill takes about the same number of tempdb pages (76xx) and even more memory.

Best Answer

You should update your statistics like below, not just indexes use for the whole table. These might take time.

UPDATE STATISTICS tablename WITH FULLSCAN;

I think you need a better index. Look at the selectivity on your columns and create indexes if you don't have;

CREATE INDEX Tag_Reading_RecordSet_RecordId 
ON Record (Tag,Reading,RecordSet,RecordId)
WITH(ONLINE=ON,FILLFACTOR=95);

CREATE INDEX Collector_DateAndTime_RecordSetId
ON RecordSet (Collector,DateAndTime,RecordSetId)
WITH(ONLINE=ON,FILLFACTOR=95);

OR

CREATE INDEX DateAndTime_Collector_RecordSetId
ON RecordSet (DateAndTime,Collector,RecordSetId)
WITH(ONLINE=ON,FILLFACTOR=95);

You can use the below query but doesn't affect the performance. Just better to read.

SELECT
  DateAndTime,
  Reading,
  Record.Tag
FROM RecordSet
INNER JOIN Record
  ON RecordSet.RecordSetId = Record.RecordSet
WHERE Record.Tag IN (10400,10406,10401,10402,10404,10399,10405,10403,10408
,10414,10409,10410,10412,10407,10413,10411,10416
,10422,10417,10418,10420,10415,10421,10419,10424
,10430,10425,10426,10428,10423,10429,10427,10432
,10438,10433,10434,10436,10431,10437,10435)
AND RecordSet.Collector = 10090
AND RecordSet.DateAndTime >= '20201101 00:00:00'
AND RecordSet.DateAndTime <= '20201201 00:00:00'
AND Record.Reading IS NOT NULL
ORDER BY Record.RecordId