If you don't really need zero-second actuality, you could just run your query time to time and cache the results.
If you still need to have real-time data on this (sacrificing insert performance), I would do this:
Since self-joins are not allowed in indexed views, you need to create two copies of each table:
CREATE TABLE personBrother
(
personId INT NOT NULL,
brotherName INT NOT NULL
)
CREATE TABLE personBrother2
(
personId INT NOT NULL,
brotherName INT NOT NULL
)
Create an indexed view on their join:
CREATE VIEW
commonBrothers
WITH SCHEMABINDING
AS
SELECT p1.personId AS p1,
p2.personId AS p2,
COUNT_BIG(*) AS cnt
FROM dbo.personBrother p1
JOIN dbo.personBrother2 p2
ON p1.brotherName = p2.brotherName
WHERE p1.personId < p2.personId
GROUP BY
p1.personId, p2.personId
CREATE UNIQUE CLUSTERED INDEX
ux_commonBrothers_p1_p2
ON commonBrothers (p1, p2)
CREATE INDEX
ix_commonBrothers_cnt
ON commonBrothers (cnt)
Same for sisters.
You should manually maintain these tables to have same data (write a trigger, insert/update/delete both etc).
Now we can easily get pairs with the most brothers and most sisters:
SELECT TOP 1 WITH TIES
*
FROM commonBrothers
ORDER BY
cnt DESC
All we need now is to fetch a greatest sum. Unfortunately, we cannot index a join of these views (it's a pure implementation flaw, there's no theoretical limitation for this).
So we need to do the following: the top pair cannot have less brothers than the top sis pair. Same holds for the sisters. So we have this query:
SELECT TOP 1 WITH TIES
cb.p1, cb.p2, cb.cnt + cs.cnt AS totalCnt
FROM commonBrothers cb
JOIN commonSisters cs
ON cs.p1 = cb.p1
AND cs.p2 = cb.p2
WHERE cs.cnt >=
(
SELECT MAX(cst.cnt)
FROM (
SELECT TOP 1 WITH TIES
p1, p2
FROM commonBrothers
ORDER BY
cnt DESC
) cbt
JOIN commonSisters cst
ON cst.p1 = cbt.p1
AND cst.p2 = cbt.p2
)
AND cb.cnt >=
(
SELECT MAX(cbt.cnt)
FROM (
SELECT TOP 1 WITH TIES
p1, p2
FROM commonSisters
ORDER BY
cnt DESC
) cst
JOIN commonBrothers cbt
ON cbt.p1 = cst.p1
AND cbt.p2 = cst.p2
)
ORDER BY
totalCnt DESC
If the numbers of common brothers and sisters are correlated, this query will be very fast.
This solution has two drawbacks:
DML
performance: if you insert or delete a record for a name shared by million brothers, the indexed view will get 2M inserts or delete. This is the price you pay for real-time query: the kind of data you are asking for cannot be easily indexed.
Persons with 0 brothers or 0 sisters will not be indexed. If there's a chance that top pair will not have brothers or sisters, you should amend the last query a little.
These two queries may help you. The first will list all of the tables and indexes on those tables in your database. If the table does not appear in the list is does not have any indexes defined on it. These queries assume SQL Server version 2005 or newer.
SELECT
IndexName = QUOTENAME(I.name),
TableName =
QUOTENAME(SCHEMA_NAME(T.[schema_id])) +
N'.' + QUOTENAME(T.name),
IsPrimaryKey = I.is_primary_key
FROM sys.indexes AS I
INNER JOIN sys.tables AS T
ON I.[object_id] = T.[object_id]
WHERE
I.type_desc <> N'HEAP'
ORDER BY
TableName ASC,
IndexName ASC;
The second query will report for each table the identity column, if any on each table in your database.
SELECT
TableName =
QUOTENAME(SCHEMA_NAME(T.[schema_id])) +
N'.' + QUOTENAME(T.name),
IdentityColumn = COALESCE(QUOTENAME(C.name), N'No identity column')
FROM sys.tables AS T
LEFT OUTER JOIN sys.columns AS C
ON T.[object_id] = C.[object_id]
AND C.is_identity = 1
ORDER BY
TableName ASC;
To limit the queries to a specific table add a WHERE
clause similar to:
WHERE T.name = N'NAME-OF-TABLE'
Best Answer
From a high level, you'd simply do this:
The
INNER JOIN
will take rows fromdbo.member
where theUID
column values match values contained in theTaskID
column from thetasklist_data
table. If you wanted ALL the rows fromdbo.member
, and only matching rows fromdbo.tasklist_data
, you'd rewrite that as:Steve Stedman has an excellent resource for understanding how the different types of
JOIN
statements work.If you need to do this on more than two tables, you simply add
JOIN
clauses as necessary:I have broken a (arguably) cardinal sin above by using
SELECT *
to return all columns from all of the tables mentioned in theFROM
clause of the query. It is generally considered good practice to explicitly specify the columns required, for instance: