I am trying to construct a where clause that requires an X number of children to have a relation to a specific parent with a N-to-N relationship. Something similar to "Get all receipts where these products were sold together."
An example scenario is easier to explain:
DECLARE @Nodes TABLE(
Id INT
);
DECLARE @Arc TABLE(
Id INT IDENTITY(1,1),
Source INT, -- FK @Nodes.Id
Dest INT -- FK @Nodes.Id
);
INSERT INTO @Nodes (Id) VALUES (1),(2),(3),(4);
INSERT INTO @Arc (Source, Dest) VALUES
(1, 2)
,(1, 3)
,(1, 3)
,(2, 1)
,(2, 3)
,(2, 4);
In the SP, the input is preprocessed into a table like this:
DECLARE @InputConnectedNodes TABLE ( Id INT );
INSERT INTO @InputConnectedNodes (Id) VALUES (3),(4); -- not a fixed number of nodes
In this example we want all nodes that have an arc to nodes 3 and 4 (only node 2 in the above example).
Currently we have this:
SELECT DISTINCT Source
FROM @Arc
WHERE Dest IN (SELECT Id FROM @InputConnectedNodes);
Then we have some ugly procedural code that loops through the input and (nested) output checking for all relationships. I'd like to replace it with something declarative. The best I came up with so far is this:
SELECT Source
FROM @Arc
WHERE Dest IN (SELECT Id FROM @InputConnectedNodes)
GROUP BY Source
HAVING COUNT(Source) = (SELECT COUNT(*) FROM @InputConnectedNodes)
But it will fail when nodes have multiple (identical) arcs like (1->3) in my example.
I could also solve this by dynamically generating a where clause, but I'd prefer a fully declarative solution.
Best Answer
I suggest loosening your resolve for a fully declarative solution. There may not be a declarative way to solve your problem, and there are many cases where dynamic SQL is a much better solution in terms of performance - even if it lacks readability.
Dynamic SQL gives the optimizer the opportunity to use different plans optimized for different parameters instead of having to come up with a single plan that solves all permutations well. The latter is quite difficult to achieve.
Your problem is called relational-division. See questions in that tag and:
Most of the Stack Overflow solutions would work as they are, or with minor modifications for SQL Server. Note how the more complicated queries, with multiple joins or
EXISTS
subqueries (that would need dynamically produced code for the arbitrary cases) are more efficient than theGROUP BY
query.