Database design for bidirectional relationship

database-design

I am modeling a bidirectional relationship between users with a relationship table. Each relationship deals with two and only two users. Each direction of the relationship allows for 3 different statuses (Pending, Blocked, Allowed).

To reduce memory overhead, I have decided on the following system to keep the entire relationship represented by a single row instead of one row per direction.

The smaller user_id is always stored as user_a and the larger user_id is stored as user_b so there is no risk of errors.

I am looking for feedback on whether I should represent both directions with a single column of (3×3 choices) or two horizontal columns of 3 choices each.

i.e.

... | status_a_b

A_Pending_B_Pending
A_Pending_B_Blocked
A_Pending_B_Allowed
A_Blocked_B_Pending
A_Blocked_B_Blocked
A_Blocked_B_Allowed
A_Allowed_B_Pending
A_Allowed_B_Blocked
A_Allowed_B_Allowed

vs

... | status_a | status_b

Pending | Pending
Blocked | Blocked
Allowed | Allowed

Thank you!

Best Answer

To reduce memory overhead,

That should never be factor in database design. Let the engine worry about that; just create your design based on how you will use your data.

I have decided on the following system to keep the entire relationship represented by a single row instead of one row per direction.

That's fine, but in both cases you'll have to do two separate queries (or two criteria) to find all relevant relationships for a user:

SELECT * 
FROM RELATIONSHIPS
WHERE From_User = @UserID OR To_User = @UserID

Note that this might be a little different if there is a meaningful difference between the two types of users. Say, between VENDORS and CLIENTS or something. But it sounds like the two are interchangeable.

I am looking for feedback on whether I should represent both directions with a single column of (3x3 choices) or two horizontal columns of 3 choices each.

I wouldn't cram them into a single column, you'll just have to parse them back out when you need to use them, and that's a violation of First Rule of Normalization.

EDIT: Here's a (contrived) example to show what I mean:

Let's say you're designing a database for a social network, one more like Twitter (user_a can follow user_b independent of whether user_b follows user_a), and less like Facebook (user_a and user_b are either "friends" or not).

So if Marsha follows Joe, but Joe doesn't follow Marsha, we have two ways to create a "follows" table:

Followed     Follower    Status
---------    --------    ------
Joe          Marsha      T
Marsha       Joe         F

or

User1     User2    RtL_Follow_Status   LtR_Follow_Status
-----     ------   -----------------   -----------------
Joe       Marsha   T                   F              

Sure, the second might save you a byte or two (or not, since I might not even need a row indicating that Joe doesn't follow Marsha), but the first is vastly easier to query, especially for the most likely application needs:

--Show all users following Joe
SELECT [Follower]
FROM [Follow_v1]
WHERE [Followed] = 'Joe'
AND [Status] = TRUE

vs

SELECT [User2] as [Follower]
FROM [Follow_v2]
WHERE [User1] = 'Joe'
AND [RtL_Follow_Status] = TRUE

UNION ALL

SELECT [User1] as [Follower]
FROM [Follow_v2]
WHERE [User2] = 'Joe'
AND [LtR_Follow_Status] = TRUE

Now your application might be nothing like this, or there might be reasons that I always need to know the reverse-follow-status when pulling up a relationship, but all I'm saying is that "saving a few bytes" is a low priority compared to making a database easy to understand and easy to query.

Note also the first example is likely to perform better as well, which might be more important as the application grows than saving a few bytes.