Assuming you are using ObjectIDs for your _id column, then you will already have an insert based timestamp. You can even extract it from the shell directly if you wish. Of course, you may want to leave this as-is to permanently store your original insertion timestamp, but adding another would be relatively trivial. Because most drivers have the ability to generate their own ObjectID and you have the published spec you can easily insert your own with a little research, and in the language of your choice.
Since ObjecIDs are the default _id value, they are also going to be a good choice for indexing, well supported in the drivers etc.
Alternatively, in terms of implementing pure timestamps in MongoDB, there is also a BSON Timestamp data type.
Storing this (or the aforementioned ObjectID) in a standard field (say "ts" for example) across documents/values would add overhead to a field/doc as you mention, but you should be able to standardize it easily (field.ts would always have a timestamp value) as well as predict the overhead for each field with a timestamp (8 bytes for a timestamp, or 12 bytes for an ObjectID - for more info see bsonspec.org).
Given that you have two native, known datatypes that provide timestamp functionality, they would be my recommendation as a way forward here.
In terms of adding them to existing data, you can choose to "lazily" add them whenever the data is next touched or issue a batch update, depending on your needs - the benefit of a flexible schema.
There are a couple of separate points here, but I don't think how MongoDB stores data in RAM is really relevant here - MongoDB just uses the mmap()
call and then lets the kernel take care of the memory management (the Linux kernel will use Least Recently Used (LRU) by default to decide what to page out and what to keep - there are more specifics to that but it's not terribly relevant).
In terms of your issues, it sounds like you might have had a corrupt index, though the evidence is somewhat circumstantial. Now that you have done a repair (the validate() command would have confirmed/denied beforehand), there won't be any evidence in the current data but you may find more evidence in the logs, particularly when you were attempting to recreate the index, or using the index in queries.
As for the spikes in the page faults, btree stats, journal, lock percentage, and average flush time, that has all the hallmarks of a bulk delete that causes a lot of index updates, and causes a large amount of IO. The fact that mapped memory drops off later in the graphs would suggest that once you ran the repair the storage size was significantly reduced, which usually indicates significant fragmentation (bulk deletes, along with updates that grow documents are the leading causes of fragmentation).
Therefore, I would look for a large delete operation logged as slow in the logs - it will only be logged once complete, so look for it to appear after the end of the events in MMS. One of the quirks of not running in a replica set is that a bulk operation like this is relatively non-obvious - it shows up as a single delete operation in the MMS graphs (usually lost in the noise).
These bulk delete operations usually tend to be run on older data that has not been recently used and has hence been paged out of active memory by the kernel (LRU again). To delete it you must page it all the data back in, then flush the changes to disk, and of course deletes require the write lock, hence the spikes in faults, lock percentage etc.
To make room for the deleted data, your current working set is paged out, which will hit performance on your normal usage until the deletes complete and the memory pressure eases.
FYI - when you run a replica set, bulk ops are serialized in the oplog
and hence replicated one at a time - as such you can track such operations by their footprints in the replicated ops stats of the secondaries. This is not possible with a standalone instance (without looking in the logs for the completed ops) and other secondary indications.
As for managing large deletes in the future, it is generally far more efficient to partition your data into separate databases (if possible) and then drop the old data when it is no longer needed by simply dropping the old databases. This requires some extra management on the application side but it negates the need for bulk deletes, is far quicker to complete, limits fragmentation, and dropped databases also remove the files on disk, preventing excessive storage use. Definitely recommended if possible with your use case.
Best Answer
As at MongoDB 3.2, there is no feedback on the reason document validation failed: the overall validation expression currently evaluates as either True ("OK") or False ("Document failed validation"). Validation behaviour can be adjusted with
validationAction
(error/warn) andvalidationLevel
(strict/moderate/off) configuration options, but this does not provide any further context for validation failures.If you want to have more detailed feedback, the recommended approach would be to add validation logic into your application rather than relying solely on server-side checks. Even with server-side validation, many checks are best done in application business logic to minimize round trips to the database server and provide more responsive feedback to the end user.
For example, user input for a web app (required fields, field formats, ...) should be validated in the browser before being submitted to your application or attempting to insert/update in the database.
However, it does make sense to validate at multiple levels to ensure data quality and some context to diagnose validation failures would be very useful. There is a relevant open feature request you can watch/up-vote in the MongoDB issue tracker: SERVER-20547: Expose the reason an operation fails document validation.
For more information you may also be interested in Document Validation - Part 1: Adding Just the Right Amount of Control Over Your Documents. This highlights some of the general pros & cons of document validation as at MongoDB 3.2, and includes a reference table for the outcome based on
validationAction
andvalidationLevel
configuration options.