I have a social app, where users can post to the school they chose on registration. Each post can have a hashtag
and a school
(all pointers, (hashtag
is optional, school
is not)).
Users see posts based on school
or hashtag
. Now we want to introduce a new feature, Polls. Polls are another type of post… Users will be seeing them in their feed among other posts! Besides other properties, they too have a hashtag
and a school
.
Right now when a user asks for posts, I'm just making a query to the Post
collection, comparing only the school
or hashtag
field (depending on the screen they're at). So if they're on the Home screen, I'm going to have something like:
find all documents in
Post
, whereschool
equals<some_value>
And if they are on the Hashtag screen, we have something like:
find all documents in
Post
, wherehashtag
equals<some_value>
Since we are going to have the new Poll
collection, I'm wondering what implementation would be best… Here's what I have so far:
1st implementation
We keep the posts query as is and we add another one, pretty much the same, for the polls and we combine the results. Have in mind here, that our app fetches data with pagination, so we bring the posts 20-by-20. I'm saying this because in case we have a page and the responses of the 2 queries bring 20 documents each, we'll have to compare dates to know what the response should be… eg 16 posts and 4 polls.
2nd (and last) implementation
We create a new collection, Feed
, which would combine both collections and it would look like this:
Feed
post: <Pointer to Post>
poll: <Pointer to Poll>
creator: <Pointer to User> // we need this for block checking or if you open your profile
school: <Pointer to School>
hashtag: <Pointer to Hashtag>
Every time a post or poll will be created, their triggers will create a Feed
document, with all the necessary values.
I believe the second solution is the best, as it is much cleaner than the first. I also think they're pretty much the same complexity-wise.
Any ideas on which one is better? If you have any other implementation to suggest, please do so!
NOTE: All collections are indexed.
Best Answer
Real mongo-design centers around the queries and only the queries. From the database perspective, there is no necessity to separate the document types. From the query perspective, I get the impression that you basically need a single collection, which contains all documents. If you need to distinguish between posts and polls, do this via a field in the document.
Also get rid of this "pointer to user" stuff. Put a copy of the necessary user fields into the respective document, and use this to display whatever you need.
The ultimate goal is, to display your complete page with a single query, without using the aggregation framework (which is slow and only a mediocre workaround for bad database design.)
And get these relational concepts out of your head. Use redundancy, optimize for the reading query. If a user changes, well yes, you'll have to update a thousand documents - but this is not the normal case, so you optimize the read and do more work on the write.
In a comment, you asked:
There's nothing wrong with duplicating the necessary data for a given query in a separate collection like your "Feed" collection. But "having too many fields" is once more a relational thought.
You don't have any fields in a collection. You have documents, and these documents have a number of fields individually.
There is no negative effect when you have 100 document types which sum up to 10000 different fields. There may be a technical drawback if you start to create many indexes, but apart from that you may well dump everything in a single collection.