Sql-server – Groups relationships – Detailed or Direct

relational-theorysql server

I have a Person model:

PersonId    Name            
----------- ---------------- 
1           Jessica         
2           Jennifer

Then I have a ShareModel which have the Personand an Itemassociated.

ShareId     ItemId            
----------- ----------------
1           1         
2           2

And finally I have a SharePeoplewhich tells all people that have access to that Item.

ShareId     PersonId            
----------- ----------------
1           1         
1           2

So now a new feature must be added, which is Groups. So I thought about two aproaches and I would like to know which one is the more correct.

So two things I'm clear, one is to add a new IsGroup property to the Person Model.

PersonId    Name             IsGroup
----------- ---------------- -----------
1           Jessica          False
2           Jennifer         False
3           Blondes          True

And another is to have the GroupPeople

GroupPersonId    ChildPersonId          
-------------    ---------------- 
3                1    
3                2

OPTION 1 : Should I create the association on the SharePeople with the IsGroup rows directly:

ShareId     PersonId            
----------- ----------------
2           3

OPTION 2 : should I create the association always with the persons which are part of the group, and have another column on that will identify from which Group the association was made.

ShareId     PersonId            GroupPersonId
----------- ----------------    -------------
2           1                   3
2           2                   3

The second option is more simple when getting data from tables, it will be fast and will create less impact, however it creates a lot of rows on the database.

The first option is cleaner but I will have to do a lot of logical statements inside my code wheter the row it's a Group or a single Person.

But bottom line is that I really don't know what's better concerning performance, and I also don't whant to do tons of odd code.

Thank you very much,

Best Answer

First, it's hard to ascertain what you need as I don't know what your data means. What is ShareID and what are the groups for? How do ItemIDs relate to ShareID? Is the data one-to-one or one-to-many? etc...

With that said, I do know for sure that Option 1 does not follow good practice. You don't store different types of data in the same columns. You have to think not only about how complicated your logic in your queries will be, but also what about data changes? It looks like you have the each person in a group in their own row followed by the group in a different row. Data changes will painfully difficult on top of slow because you're not going to have to use very complicated logic.

So option 2 seems to better to me. It doesn't have any blaring issues. Also remember the number of rows does not impact performance in the way you think it does. If you have more rows with good structure so you are using SQL the way it was optimized(dealing with sets not Row by Agonizing Row(RBAR)), then it will actually perform better than less rows in bad structure.

Just remember keep narrow data types, use indexes, and follow good SQL coding practices: always set-based(no cursors or loops), only the columns you need, and no functions on the predicate of the where clause.

I do suggest you read up on data normalization(1NF,2NF,3NF) and I'm sure there are plenty of helpful articles and examples on how to structure your tables that you can find online.

Related Solutions

Sql-server – How to speed up a query that orders by a calculated field

If you don't really need zero-second actuality, you could just run your query time to time and cache the results.

If you still need to have real-time data on this (sacrificing insert performance), I would do this:

Since self-joins are not allowed in indexed views, you need to create two copies of each table:

CREATE TABLE personBrother
        (
        personId INT NOT NULL,
        brotherName INT NOT NULL
        )

CREATE TABLE personBrother2
        (
        personId INT NOT NULL,
        brotherName INT NOT NULL
        )

Create an indexed view on their join:

CREATE VIEW
        commonBrothers
WITH SCHEMABINDING
AS
        SELECT  p1.personId AS p1,
                p2.personId AS p2,
                COUNT_BIG(*) AS cnt
        FROM    dbo.personBrother p1
        JOIN    dbo.personBrother2 p2
        ON      p1.brotherName = p2.brotherName
        WHERE   p1.personId < p2.personId
        GROUP BY
                p1.personId, p2.personId

CREATE UNIQUE CLUSTERED INDEX
        ux_commonBrothers_p1_p2
ON      commonBrothers (p1, p2)

CREATE INDEX
        ix_commonBrothers_cnt
ON      commonBrothers (cnt)

Same for sisters.

You should manually maintain these tables to have same data (write a trigger, insert/update/delete both etc).

Now we can easily get pairs with the most brothers and most sisters:

SELECT  TOP 1 WITH TIES
        *
FROM    commonBrothers
ORDER BY
        cnt DESC

All we need now is to fetch a greatest sum. Unfortunately, we cannot index a join of these views (it's a pure implementation flaw, there's no theoretical limitation for this).

So we need to do the following: the top pair cannot have less brothers than the top sis pair. Same holds for the sisters. So we have this query:

SELECT  TOP 1 WITH TIES
        cb.p1, cb.p2, cb.cnt + cs.cnt AS totalCnt
FROM    commonBrothers cb
JOIN    commonSisters cs
ON      cs.p1 = cb.p1
        AND cs.p2 = cb.p2
WHERE   cs.cnt >=
        (
        SELECT  MAX(cst.cnt)
        FROM    (
                SELECT  TOP 1 WITH TIES
                        p1, p2
                FROM    commonBrothers 
                ORDER BY
                        cnt DESC
                ) cbt
        JOIN    commonSisters cst
        ON      cst.p1 = cbt.p1
                AND cst.p2 = cbt.p2
        )
        AND cb.cnt >=
        (
        SELECT  MAX(cbt.cnt)
        FROM    (
                SELECT  TOP 1 WITH TIES
                        p1, p2
                FROM    commonSisters
                ORDER BY
                        cnt DESC
                ) cst
        JOIN    commonBrothers cbt
        ON      cbt.p1 = cst.p1
                AND cbt.p2 = cst.p2
        )
ORDER BY
        totalCnt DESC

If the numbers of common brothers and sisters are correlated, this query will be very fast.

This solution has two drawbacks:

DML performance: if you insert or delete a record for a name shared by million brothers, the indexed view will get 2M inserts or delete. This is the price you pay for real-time query: the kind of data you are asking for cannot be easily indexed.
Persons with 0 brothers or 0 sisters will not be indexed. If there's a chance that top pair will not have brothers or sisters, you should amend the last query a little.

Sql-server – pros/cons of different ways to store whether a record is one of two options

At the end of the day, #3 is still the BEST option. We go with what we think is simple at that point but more often than not, business will come up with another reason to add one more type of address.

Design it correctly from the get-go! Good luck!

Best Answer

Related Solutions

Sql-server – How to speed up a query that orders by a calculated field

Sql-server – pros/cons of different ways to store whether a record is one of two options

Related Question