MongoDB Database Design – Best Practices for Feed Restructuring in a Social App

I have a social app, where users can post to the school they chose on registration. Each post can have a hashtag and a school (all pointers, (hashtag is optional, school is not)).

Users see posts based on school or hashtag. Now we want to introduce a new feature, Polls. Polls are another type of post… Users will be seeing them in their feed among other posts! Besides other properties, they too have a hashtag and a school.

Right now when a user asks for posts, I'm just making a query to the Post collection, comparing only the school or hashtag field (depending on the screen they're at). So if they're on the Home screen, I'm going to have something like:

find all documents in Post, where school equals <some_value>

And if they are on the Hashtag screen, we have something like:

find all documents in Post, where hashtag equals <some_value>

Since we are going to have the new Poll collection, I'm wondering what implementation would be best… Here's what I have so far:

1st implementation

We keep the posts query as is and we add another one, pretty much the same, for the polls and we combine the results. Have in mind here, that our app fetches data with pagination, so we bring the posts 20-by-20. I'm saying this because in case we have a page and the responses of the 2 queries bring 20 documents each, we'll have to compare dates to know what the response should be… eg 16 posts and 4 polls.

2nd (and last) implementation

We create a new collection, Feed, which would combine both collections and it would look like this:

Feed

post: <Pointer to Post>
poll: <Pointer to Poll>
creator: <Pointer to User> // we need this for block checking or if you open your profile
school: <Pointer to School>
hashtag: <Pointer to Hashtag>

Every time a post or poll will be created, their triggers will create a Feed document, with all the necessary values.

I believe the second solution is the best, as it is much cleaner than the first. I also think they're pretty much the same complexity-wise.

Any ideas on which one is better? If you have any other implementation to suggest, please do so!

NOTE: All collections are indexed.

Best Answer

Real mongo-design centers around the queries and only the queries. From the database perspective, there is no necessity to separate the document types. From the query perspective, I get the impression that you basically need a single collection, which contains all documents. If you need to distinguish between posts and polls, do this via a field in the document.

Also get rid of this "pointer to user" stuff. Put a copy of the necessary user fields into the respective document, and use this to display whatever you need.

The ultimate goal is, to display your complete page with a single query, without using the aggregation framework (which is slow and only a mediocre workaround for bad database design.)

And get these relational concepts out of your head. Use redundancy, optimize for the reading query. If a user changes, well yes, you'll have to update a thousand documents - but this is not the normal case, so you optimize the read and do more work on the write.

In a comment, you asked:

What if I still created a new collection Feed, but instead of pointers, I'll use copies of documents? (without every field, only selected ones) Could that somehow slow things down or not be as fast/good as just integrating the polls inside the Post collection?

There's nothing wrong with duplicating the necessary data for a given query in a separate collection like your "Feed" collection. But "having too many fields" is once more a relational thought.

You don't have any fields in a collection. You have documents, and these documents have a number of fields individually.

There is no negative effect when you have 100 document types which sum up to 10000 different fields. There may be a technical drawback if you start to create many indexes, but apart from that you may well dump everything in a single collection.

Best Answer

Related Solutions

What’s the best practice for representing set operations in a relational database

MongoDB Database Design – Separate Login for Users and Administrators

Related Question