If you don't really need zero-second actuality, you could just run your query time to time and cache the results.
If you still need to have real-time data on this (sacrificing insert performance), I would do this:
Since self-joins are not allowed in indexed views, you need to create two copies of each table:
CREATE TABLE personBrother
(
personId INT NOT NULL,
brotherName INT NOT NULL
)
CREATE TABLE personBrother2
(
personId INT NOT NULL,
brotherName INT NOT NULL
)
Create an indexed view on their join:
CREATE VIEW
commonBrothers
WITH SCHEMABINDING
AS
SELECT p1.personId AS p1,
p2.personId AS p2,
COUNT_BIG(*) AS cnt
FROM dbo.personBrother p1
JOIN dbo.personBrother2 p2
ON p1.brotherName = p2.brotherName
WHERE p1.personId < p2.personId
GROUP BY
p1.personId, p2.personId
CREATE UNIQUE CLUSTERED INDEX
ux_commonBrothers_p1_p2
ON commonBrothers (p1, p2)
CREATE INDEX
ix_commonBrothers_cnt
ON commonBrothers (cnt)
Same for sisters.
You should manually maintain these tables to have same data (write a trigger, insert/update/delete both etc).
Now we can easily get pairs with the most brothers and most sisters:
SELECT TOP 1 WITH TIES
*
FROM commonBrothers
ORDER BY
cnt DESC
All we need now is to fetch a greatest sum. Unfortunately, we cannot index a join of these views (it's a pure implementation flaw, there's no theoretical limitation for this).
So we need to do the following: the top pair cannot have less brothers than the top sis pair. Same holds for the sisters. So we have this query:
SELECT TOP 1 WITH TIES
cb.p1, cb.p2, cb.cnt + cs.cnt AS totalCnt
FROM commonBrothers cb
JOIN commonSisters cs
ON cs.p1 = cb.p1
AND cs.p2 = cb.p2
WHERE cs.cnt >=
(
SELECT MAX(cst.cnt)
FROM (
SELECT TOP 1 WITH TIES
p1, p2
FROM commonBrothers
ORDER BY
cnt DESC
) cbt
JOIN commonSisters cst
ON cst.p1 = cbt.p1
AND cst.p2 = cbt.p2
)
AND cb.cnt >=
(
SELECT MAX(cbt.cnt)
FROM (
SELECT TOP 1 WITH TIES
p1, p2
FROM commonSisters
ORDER BY
cnt DESC
) cst
JOIN commonBrothers cbt
ON cbt.p1 = cst.p1
AND cbt.p2 = cst.p2
)
ORDER BY
totalCnt DESC
If the numbers of common brothers and sisters are correlated, this query will be very fast.
This solution has two drawbacks:
DML
performance: if you insert or delete a record for a name shared by million brothers, the indexed view will get 2M inserts or delete. This is the price you pay for real-time query: the kind of data you are asking for cannot be easily indexed.
Persons with 0 brothers or 0 sisters will not be indexed. If there's a chance that top pair will not have brothers or sisters, you should amend the last query a little.
I'd suggest you have a think about this system's use cases. Work out in your own mind how the queries will look for the given design and for your alternative. Include scenarios where people move between slots. What do updates look like? Can business constraints be enforced in DRI? Can the desired values still be found? Having actual examples to talk through will highlight your concerns and allow your boss to explain the model more fully.
Writing horror queries against a ghastly schema is only one part of the problem. Someone has to fix them, at 3am, with no documentation, when they break. More intelligible queries at this stage will give you a better system, sooner.
If it calms your nerves any, I've had success with EAV systems. That said, we were a team of very experienced DBAs, it really was a blue-sky, "what if" project and, no, we didn't bet the company on it. Any dev project that will be heading to production we discourage most strongly from using an EAV approach.
If you go the 'flexible' route do not try to write generic queries just because you have a generic schema. Each query should only deal with one of Company, Location, Region or whatever your slot types are.
My two cents' worth.
Best Answer
You don't need a join, since you have no columns from
table_B
in the output. You could use a semi-join, i.e anEXISTS
subquery:If you did want some other column from
table_B
(say theSpecialty
) and still only 1 row, you could use either a derived table with a window function, orCROSS APPLY
withTOP 1
.CROSS APPLY
is a special join where we can reference previous tables. Note that it's similar toINNER JOIN
(you hadLEFT JOIN
while you needINNER JOIN
):