MongoDB schema for a news service

mongodbschema

I would like to create a MongoDB database that would be very similar to how various news services store and display their data. The database would be queried by an android phone.

Let's have a look at the BBC Android app:

BBC

You can see that the news stories are broken up into their various continents, Africa, Asia with a little picture and a text description of the story next to it.

I would like to find the best schema one for this.

There are 2 scenarios of DB schema I can think of:

allscenarios

Scenario 1

I would store all the information used to render the view within one document in a collection called "NewsGroup". The phone would need to make one database call to get all the information and this is quick and efficient.

If the user clicks on a story, the storyId could be used to pull the story out of the "Story" collection to get further details on the story, like the actual text of the story, readers comments etc.

The disadvantage of this scenario is that the information seems to be duplicated (title and picUrl are the same in both collection "Story" and "NewsGroup") and I'm scared this might lead to redundancy.

Scenario 2

I only store the storyId in the "NewsGroup" collection. The phone needs to make 2 calls to the database, one to the "NewsGroup" collection to get the list of all storyIds per continent and then one to the "Story" collection to get the story title and picUrl so that the list can be displayed. This might cause delays in the user experiences however it would improve the integrity of the databases as there is no redundant information.

General Comment:

It would be easier if MongoDB has a join operation like SQL to combined both collection when the phone queries the database hence one single operation on the server side, but MongoDB cannot perform joins.

I can also use MapReduce to simulate a join but I'm concerned about the slow performance of MapReduce in MongoDB: https://stackoverflow.com/questions/3947889/mongodb-terrible-mapreduce-performance

Best Answer

Actually, you can do this with a single collection (news items) and a simple query. No need for references (continent names aren't going to change any time soon) or strange data models here:

{
  _id: new ObjectId(),
  continent: "Asia"
  title: "whatever",
  date: yesterday
},
{
  _id: new ObjectId(),
  continent: "Africa",
  title: "other title",
  date: today
}

The according query would be

db.news.find({}).sort({ "continent": 1, "date": -1 })

In order to speed this query up, you should create an according index

db.news.createIndex({ "continent": 1, "date": -1 })

With that approach, you can easily do things like "For each continent, count the number of news for last week":

var date = new Date()
var last_week = date.setDate( date.getDate() - 7 );
var result = db.news.aggregate([
  { $match:{ "date":{ $gte: last_week } } },
  { $group:{ "_id": "$continent", "newsitems":{ $sum: 1 } } }
])