I am modeling a bidirectional relationship between users with a relationship table. Each relationship deals with two and only two users. Each direction of the relationship allows for 3 different statuses (Pending, Blocked, Allowed).
To reduce memory overhead, I have decided on the following system to keep the entire relationship represented by a single row instead of one row per direction.
The smaller user_id is always stored as user_a and the larger user_id is stored as user_b so there is no risk of errors.
I am looking for feedback on whether I should represent both directions with a single column of (3×3 choices) or two horizontal columns of 3 choices each.
i.e.
... | status_a_b
A_Pending_B_Pending
A_Pending_B_Blocked
A_Pending_B_Allowed
A_Blocked_B_Pending
A_Blocked_B_Blocked
A_Blocked_B_Allowed
A_Allowed_B_Pending
A_Allowed_B_Blocked
A_Allowed_B_Allowed
vs
... | status_a | status_b
Pending | Pending
Blocked | Blocked
Allowed | Allowed
Thank you!
Best Answer
That should never be factor in database design. Let the engine worry about that; just create your design based on how you will use your data.
That's fine, but in both cases you'll have to do two separate queries (or two criteria) to find all relevant relationships for a user:
Note that this might be a little different if there is a meaningful difference between the two types of users. Say, between
VENDORS
andCLIENTS
or something. But it sounds like the two are interchangeable.I wouldn't cram them into a single column, you'll just have to parse them back out when you need to use them, and that's a violation of First Rule of Normalization.
EDIT: Here's a (contrived) example to show what I mean:
Let's say you're designing a database for a social network, one more like Twitter (user_a can follow user_b independent of whether user_b follows user_a), and less like Facebook (user_a and user_b are either "friends" or not).
So if Marsha follows Joe, but Joe doesn't follow Marsha, we have two ways to create a "follows" table:
or
Sure, the second might save you a byte or two (or not, since I might not even need a row indicating that Joe doesn't follow Marsha), but the first is vastly easier to query, especially for the most likely application needs:
vs
Now your application might be nothing like this, or there might be reasons that I always need to know the reverse-follow-status when pulling up a relationship, but all I'm saying is that "saving a few bytes" is a low priority compared to making a database easy to understand and easy to query.
Note also the first example is likely to perform better as well, which might be more important as the application grows than saving a few bytes.