MySQL Database Design – Implementing User Privacy Settings

database-designMySQLschema

I am actually working on a social website. Not big, aimed at a very small group of folks. And I am curious, to how I can best implement Privacy settings for my users in a MySQL driven database.
For example, on websites, like, Facebook, Google+, and probably other social websites, you find privacy settings and users can set these for their objects (e.g. content they share).

My issue isn’t writing the code in the language I am doing it in, but my issues are actually structuring up a DB schema that allows me to manage users’ settings. I have never done something like this, and would appreciate any help; also it must be scalable to some extent. Thanks a lot.

This is a JSON output I found from some website, and it looks like each part: allow, deny, description, friends, and value are all columns. However, in MySQL, you cannot really store large amount of comma-separated values, there's simply no space in a single column to do so.

{
  "privacy": {
    "allow": "",
    "deny": "299716926803603",
    "description": "Contacts; Except: Restricted",
    "friends": "ALL_Contacts",
    "value": "CUSTOM"
  },
  "created_time": "2015-09-04T13:01:40+0000",
  "id": "100002958951118_777860722322552"
}

I did something like this, but it simply doesn't work.

enter image description here

Privacy table:

allow       string
deny        string
description string
friends     string
id          int
object_id   int
owner_id    int
value       string

Not sure about this Privacy Table sample.

I'd appreciate a good solution, that is applicable, and most of the users in this community will benefit from this post.

Best Answer

There is a spectrum of options open to you, depending on how much normalisation you desire, how much denormalisation you're prepared to live with, the use cases you have currently and what future functionality you reasonably expect to support.

The most trivial implementation would be to treat your JSON as a blob and store it accordingly, in a VARCHAR column. The application takes care of constructing the JSON and parsing it. MySQL acts as a glorified key-value store. This will be easy to implement. It is likely to scale well. There will be no flexibility of usage, no built-in data integrity and ad hoc queries will be awful.

Next from this is using a column of type JSON. The DBMS has insight into the content of the JSON and can optimise with indexes, more compact storage and validation etc. It is not what a relational database is optimised for, however, so is unlikely to be your best bet.

I include these two for completeness. Neither of them use the relational nature of MySQL. If you were to go this route you would be better adopting one of the many fine open source NoSQL solutions available.

To use the relational nature of your chosen product you will need to normalise your data. The given examples and business rules are sparse so I'll infer and extrapolate somewhat, but I believe my model will cover the essential aspects.

Obviously there will be an entity type called "User", with some large integer type as the primary key. I'll call this UserID.

All the different things you want to control visibility of will likely have their own representation. I'd suggest you use the supertype/ subtype pattern. Amongst other benefits this will abstract the security from other concerns and allow addition of new types of interaction without changing this model. I'll call the root type for all these types of things a Posting and give it a surrogate key PostingID.

Your example image has five categories of visibility, one of which is Custom. I'd suggest an entity type to cover this list. Let's call it Visibility. There is a question of how the system is to reflect changing friendships. If the Posting is visible only to Friends, and a new friendship is created, should that new friend see the old posting or should it remain shown only to those who were friends at the point of posting? Similarly when unfriended does the miscreant loose rights to see items? I'm assuming the dynamic option is what is preferred.

It is tempting to form an intersection of these three to hold rights to specific postings. This would get very large very quickly, however, and would have substantial write activity reflecting changing friendships.

Better would be to hold the Visibility as an attribute in Posting (a foreign key column) and the corresponding UserFriend list for each User. It may be possible to combine the "Friends" and "Friends except Acquaintances" lists with a flag to distinguish the latter, depending on your specific rules. I'm assuming that friendship is defined as mutual so if A is a friend of B then B is by definition a friend of A. This will double the number of rows in UserFriend but will simplify the semantics.

A separate entity type will be reqired when Visibility has a value of "Custom". This will list the specific UserID values for that PostingID.

From this the tables will be

User
  UserID

UserFriend
  UserID    -- FK to UserID; the owner of this list
  FriendID  -- FK to UserID; the friend's UserID

Visibility
  VisibilityID

Posting
  PostingID
  PostedBy      -- FK to UserID
  VisibilityID

PostingUser
  PostingID
  VisibleTo    -- FK to UserID; the friend who can see this posting

To construct a list of a user's friends' recent activitis one would read the given user's UserFriend rows, join to Posting on the friends' UserIDs, where the posting visibility was "public", "friends" or "custom". For those that have a custom list, join to that on PostingID and the given UserID.

Assuming the symantics of "deny" are "all my friends except this one" I'd implement this by adding a flag to PostingUser to differentiate allow and deny.

You can take a step back along the spectrum of options by denormalising the above. For example, PostingUser could be stored as a comma list in Posting, as UserFriend could be in Friend. This would reduce the IO and, likely, the elapsed time. The trade-off is that more of the work has to be done in the application instead of the DBMS.

Although Facebook famously uses MySQL the above is assuredly not how they do this. They most certainly have deployed many man-years of custom coding, caching, sharding and other tricks to make this work at their scale.

As a final point I would mention solving these sorts of problems are exactly what graph databases are for. Dispense with MySQL and implement this in, say, Neo4J.