Sql-server – Join on primary keys to primary key vs primary key in range

sql server

Assume we have an #RelevantOrders table and an OrderRow table.

create table #RelevantOrders(Order_Id int primary key)

create talbe OrderRow (Order_Id int, OrderRow_Id, [lots of data columns], primary key (Order_Id, OrderRow_Id)

Now what I by instinct did initialy do was a simple join

select OrderRow.* from #RelevantOrders as ro join OrderRow on OrderRow.Order_Id = ro.Order_Id

Now, due to the data, the Order_Id in #RelevantOrders are several ranges of Ids, ie. 1,2,3,4,5,100,101…120, 240..250 … hope i am being clear that.

So what happens, if there are 1000 Relevant orders, execution plan does 1000 index seeks on OrderRow table, 1 per each Order_Id. This leads quite 'big' scan count and logical reads (~5000).

SO it got me thinking, i would expect that instead of doing 1000 index seeks, it would do ~80 (number of continuous ranges) index seeks.

So i tried an alternative:

create table #RelevantOrderRanges(f int, t int)

select OrderRow.* from #RelevantOrderRanges
join OrderRow on OrderRow.Order_Id between f and t

The good news is ammount of index seek and logical reads drops to 80 index seeks (equal to amount of ranges) and 609 logical reads. Much better. The only issue i have here and thats due to my experience with it causing to gobble up memory is the estimated number of rows and estimated data size, which goes up to about 3.5GB in this case instad of 49KBs.

Now i know that in case number of ranges approaches number of orders any gain is lost.

So this got me thinking and i have two questions:
1) am i overthinking the estimates too much? I know more testing is required but i am a bit afraid the estimates would bite me later due to memory hog

2) is there any way to 'better explain' SQL what to expect? Indices, hints, statistics…. ?

Thanks for discussion

PERFORMANCE UPDATE:
TC 1:
Order_Id count: 9045
Range count: 5499
Total rows produced:

Range based approach:
~800ms CPU, ~800ms ellapsed time
Scan count: 5,500   
Logical reads: 37,896

ID join approach:
~900ms CPU, ~900ms ellapsed time
Scan count: 9045
Logical reads: 56774

TC 2:
Order_Id count: 19684
Range count: 12849
Total rows produced: 124561

Range based approach:
~718ms CPU, ~770ms ellapsed time
Scan count: 12849
Logical reads: 74127

ID join approach:
~950ms CPU, ~950ms ellapsed time
Scan count: 19684
Logical reads: 110352

Best Answer

Edit. I misunderstood the question. I thought this was about app design rather than query performance.

For an App: Personally i would avoid using an intermediate table. Its just extra steps and complication.

The #table makes sense for queries.

Provided you have useful indexes and your stats are up to date, you are typically better off leaving SQL Server to decide the best execution plan.

Depending on the growth and distribution of data the best execution plan may change over time. So i would be reluctant to invest a lot of time and effort coding to squeeze the last 1% of extra performance.