SQL Server Optimization – Select Entity Requiring Relations from All Related Entities

dynamic-sqloptimizationperformancequery-performancerelational-divisionsql server

I am trying to construct a where clause that requires an X number of children to have a relation to a specific parent with a N-to-N relationship. Something similar to "Get all receipts where these products were sold together."
An example scenario is easier to explain:

DECLARE @Nodes TABLE(
  Id INT
);

DECLARE @Arc TABLE(
  Id INT IDENTITY(1,1),
  Source INT, -- FK @Nodes.Id
  Dest INT    -- FK @Nodes.Id
);

INSERT INTO @Nodes (Id) VALUES (1),(2),(3),(4);

INSERT INTO @Arc (Source, Dest) VALUES
   (1, 2)
  ,(1, 3)
  ,(1, 3)
  ,(2, 1)
  ,(2, 3)
  ,(2, 4);

In the SP, the input is preprocessed into a table like this:

DECLARE @InputConnectedNodes TABLE ( Id INT );
INSERT INTO @InputConnectedNodes (Id) VALUES (3),(4); -- not a fixed number of nodes

In this example we want all nodes that have an arc to nodes 3 and 4 (only node 2 in the above example).
Currently we have this:

SELECT DISTINCT Source 
    FROM @Arc
    WHERE Dest IN (SELECT Id FROM @InputConnectedNodes);

Then we have some ugly procedural code that loops through the input and (nested) output checking for all relationships. I'd like to replace it with something declarative. The best I came up with so far is this:

SELECT Source 
    FROM @Arc 
    WHERE Dest IN (SELECT Id FROM @InputConnectedNodes) 
    GROUP BY Source
    HAVING COUNT(Source) = (SELECT COUNT(*) FROM @InputConnectedNodes)

But it will fail when nodes have multiple (identical) arcs like (1->3) in my example.

I could also solve this by dynamically generating a where clause, but I'd prefer a fully declarative solution.

Best Answer

I suggest loosening your resolve for a fully declarative solution. There may not be a declarative way to solve your problem, and there are many cases where dynamic SQL is a much better solution in terms of performance - even if it lacks readability.

Dynamic SQL gives the optimizer the opportunity to use different plans optimized for different parameters instead of having to come up with a single plan that solves all permutations well. The latter is quite difficult to achieve.

Your problem is called relational-division. See questions in that tag and:

Divided We Stand: The SQL of Relational Division by Joe Celko; and
How to filter SQL results in a has-many-through relation on Stack Overflow.

Most of the Stack Overflow solutions would work as they are, or with minor modifications for SQL Server. Note how the more complicated queries, with multiple joins or EXISTS subqueries (that would need dynamically produced code for the arbitrary cases) are more efficient than the GROUP BY query.

Best Answer

Related Solutions

Sql-server – Oracle GoldenGate add trandata errors

SQL Server – Fastest Way to Insert Large Numbers of Rows

Related Question