Efficient SQL Query – Return Parents Satisfying Constraints on Children

execution-planoptimizationperformancequery-performancesql serversql-server-2016

I've got a schema something like having an Entity table, with a to-many relationship to Attribute, which has a to-many relationship to Value. Attribute has a column Name, and Value has a column Value:

Entity
------
Id
Name


Attribute
---------
Id
EntityId
Name


Value
-----
Id
AttributeId
Value

I need to be able to construct a query that will return only Entity rows that have certain combinations of attribute/value children. Currently I'm writing the query something like:

select
    *
    from Entity e   
    where exists(
        select * from Attribute a 
            join Value v on v.AttributeId = a.id 
            where a.EntityId = e.id and a.Name = 'color' and v.Value = 'red')
    and exists(
        select * from Attribute a 
            join Value v on v.AttributeId = a.id 
            where a.EntityId = e.id and a.Name = 'size' and v.Value = 'small')

This works, but every extra attribute filter adds another exists clause, and I'm nervous about performance. Is there a more efficient way to write queries of this type, or is it likely to be perfectly performant if I use correct indexes etc?

Best Answer

It's difficult to predict if your original query will have performance problems. We don't know how large your tables are, what indexes you have, how many filters you might add, how quickly the queries need to finish, or so on. You may want to consider writing the query in the most natural way to do and measuring performance. If performance is acceptable leave the code as is.

If you do need to improve performance, you can consider a rewrite similar to the following:

select *
from Entity e 
WHERE e.id IN (
    SELECT a.id
    FROM Attribute a 
    join [Value] v on v.AttributeId = a.id 
    WHERE a.Name IN ('color', 'size') AND v.Value IN ('red', 'small') -- redundant filter to improve performance
    GROUP BY a.id
    HAVING MAX(CASE WHEN a.Name = 'color' and v.Value = 'red' THEN 1 ELSE 0 END) = 1
    AND MAX(CASE WHEN a.Name = 'size' and v.Value = 'small' THEN 1 ELSE 0 END) = 1
);

That will give you a single scan of the attribute and value tables regardless of how many filters you add to the HAVING clause. One downside is that you're likely to get a poor cardinality estimate from the HAVING filter, but if the goal of the filters is to filter out most of the rows from the Entity table then that may not be a problem.