Mongodb – How to time-stamp data in MongoDB

mongodbtimestamp

I'm responsible for writing an app to juggle a huge amount of client and business intelligence information. This information involves records for various businesses and metrics like revenue, their web platform, etc. For reporting purposes, it has been decided that the data now must be time-stamped so that we may construct reports like "Foo Inc.'s revenue increased by 200% when they switched to digital television in 2010."

Or something to that effect. We don't really deal with clients that are that far behind the times.

As it stands, the data is stored in MongoDB and its schema (or lack thereof) exactly represents how it's displayed. A document has a name, maybe embedded documents for address information or operations-related stats, etc. I somehow need to add the concept of time-stamping, or versioning to this. The most obvious way to model this would be like so:

key: [{ value: "bar", time: "2012" }, { value: "baz", time: "2011" }]

Where every property is timestamped. Obviously, this breaks the schema, and adds maintenance issues. I've also toyed with using conventions (naming property keys like "Foo (2012-09-31)", and keeping copies of entire collections with a timestamp.

Obviously, this is a very big feature to implement and I was wondering what the best way to implement this would be. What is the most effective and performance-friendly way to add timestamps to data in MongoDB?

Edit:
Yes, I'm looking at versioning, although with all likelihood new data sets will come in large batches and not as often (for instance, every quarter). This is why I was considering using separate collections for each set and timestamping each collection, because the data is lots and far between, but I'd love to know if there's a better, more maintainable way.

Best Answer

Assuming you are using ObjectIDs for your _id column, then you will already have an insert based timestamp. You can even extract it from the shell directly if you wish. Of course, you may want to leave this as-is to permanently store your original insertion timestamp, but adding another would be relatively trivial. Because most drivers have the ability to generate their own ObjectID and you have the published spec you can easily insert your own with a little research, and in the language of your choice.

Since ObjecIDs are the default _id value, they are also going to be a good choice for indexing, well supported in the drivers etc.

Alternatively, in terms of implementing pure timestamps in MongoDB, there is also a BSON Timestamp data type.

Storing this (or the aforementioned ObjectID) in a standard field (say "ts" for example) across documents/values would add overhead to a field/doc as you mention, but you should be able to standardize it easily (field.ts would always have a timestamp value) as well as predict the overhead for each field with a timestamp (8 bytes for a timestamp, or 12 bytes for an ObjectID - for more info see bsonspec.org).

Given that you have two native, known datatypes that provide timestamp functionality, they would be my recommendation as a way forward here.

In terms of adding them to existing data, you can choose to "lazily" add them whenever the data is next touched or issue a batch update, depending on your needs - the benefit of a flexible schema.