MongoDB Database Design – How to Layout Data

database-designmongodb

I have 2TB of data currently spread across many CSV files that looks like:

{
 'rec_date': "2010-01-29", 
 'rec_time': "09:15:00", 
 'site_no': '46',
 'data_owner': '1',
 'flow_period': '5',
 'vehicle_count': '60',
 'detector': '8'
}

I would like to insert this into mongodb so that I can quickly do queries that typically get the vehicle volume for a site and its surrounding sites for the past n Mondays of the past m years. (I wish to analyse the history of a site to determine if there's abnormal traffic flow for this time of day at this time of year).

Initially I wanted to store a document for each site, that contained its entire history, like so:

{
    site_no: 46,
    history: [{
            $date: 1264756500000,
            readings: [{sensor:1,vehicle_count:60},
                       {sensor:2,vehicle_count:32},
                       ... 
                      ]
            },{
            $date: 1264756800000,
            readings: [....]
            }
    ]
}

But this would make each document very large, as the entire history for each site will be more than 16MB. More traffic volume data is coming in every 5 minutes from every site and I would need to be appending to the history array very often.

So my question is, how should I format my data so that I can perform the queries I want but minimising data redundancy? Should I just make a new record for every sensor reading?

Best Answer

Yes, create a new record for every reading.

MongoDB and other document stores are designed for efficient retrieval of de-normalized or irregularly structured data. A MongoDB collection is essentially a set of key-value pairs with a rich value type. Updating the value is an atomic operation, which means that constantly appending to a list within the value is not going to be efficient and, as you mentioned, you will eventually hit the document size limit.