SQL Server – IN Clause Listing Values vs Query Performance

optimizationperformancequery-performancesql server

My app runs a lot of queries like this

SELECT ... FROM Table
    WHERE CategoryID IN (SELECT CategoryID FROM OtherTable WHERE ..)

To save performance I decided to cache the second subquery in the application's memory (it rarely changes) and change the DB-query to

SELECT ... FROM Table
    WHERE CategoryID IN (123,234,345,456,567,678,789,...)

Question: does this make any sense?

PROS: it caches the data which is a good thing. Right?

CONS: the second query cannot be parameterized, so the execution plan is not compiled/cached by the SQL server. Also, the IN clause is treated like a dozen OR's, can this degrade performance?

CategoryID field in Table is an integer with an index.

PS. I tried testing, but I cannot get my tests to the scale of my app (that runs 1000s of queries) so I'd sppreciate any "generic" explanation from someone… Thanks!

Best Answer

A lot of this depends on what else your query is doing. If that predicate is one of many, and isn't particularly useful for an index, then you may be seeing that predicate in a Predicate property (being a residual predicate) in your plan.

Alternatively, if you have an index on CategoryID, then are you wanting it to do dozens of Seeks into Table?

I suspect that the cost of the sub-query, especially if it's run frequently so that it's in memory already, would be negligible if you're doing dozens of Seeks. But without an index on CategoryID, you might be better off with just a residual, by using your IN clause. If you have a maximum number of values that could be used, you could parameterise it, like: (@p1, @p2, ...). To get the same number of parameters each time would be important. If you always have between 10 and 15, then maybe code up for 15, and use -1 as the value for any extras (ie, if you have 11 values, then 12-15 could be all -1 to avoid recompilation).