I have an application that (mostly) uses only the very latest data available for a certain identifier (may consist of multiple columns/keys). However, these data may be bad or implausible. In these cases, it has to access the history of the entry to verify that value or calculate an alternative.
For the second use case I created an sorted index identifier..., time
.
However, in my mind this index does not really fit the main use case, because it still have to consider n values (with different times) for each identifier.
Is there a way in mongodb to create an index on something like identifier..., $max(time)
? Or isn't that necessary at all because mongodbs internal query optimization is smart enough to only look at the first/last value of the index?
Environment:
- MongoDB 4.0.1
- Java Applications
- I have plenty of (big) collections that might need this logic (many TB of data in total)
- New data are added often; roughly one entry per identifier per minute, sometimes multiple entries per second
- Reads occur in roughly fixed intervals (15s) for fixed subsets of ids each (only the readers knows which identifiers he uses); only the latest data are relevant
- Readers might do a cold (re-)start needing to fetch the history to better predict the future/understand the current state (also happens when the reader considers the input data as logically implausible).
- Network/processing sometimes introduces delays/shuffled input values.
Alternatives considered:
- Use separate tables: One for live data, one for historic data
- Decreases write performance as I have to write in two collections
- Requires some kind of transactional behavior, as I have to make sure that entries will always be added to both collections.
- Requires additional logic that makes sure that only the latest values are in the live table. Entries might change order or might arrive multiple times due to network/processing latency.
- Use separate tables with some kind of insert hook
- Decreases write performance as I have to write in two collections
- Introduces additional delay due to the hook processing time
- Requires a new tool and thus increases complexity
- Requires additional logic that makes sure that only the latest values are in the live table. Entries might change order or might arrive multiple times due to network/processing latency.
- Sync/hook failures are hard to detect
- Use of some kind of cache (redis)
- Requires a new tool and thus increases complexity
- Requires some kind of transactional behavior, as I have to make sure that entries will always be added to both tools.
- Requires additional logic that makes sure that only the latest values are in the live table. Entries might change order or might arrive multiple times due to network/processing latency.
EDIT
Example:
sensor_id | time | value
1 | 12:15 | 5.12 <-- I don't need this value or any of its predecessors
1 | 12:16 | 5.15 <-- I need this value
3 | 12:16 | 1.23 <-- and I need this value
4 | 1998 AD | 4200 <-- Not needed
4 | 1999 AD | 5000 <-- but I need this value (even if is years old)
Best Answer
As per MongoDB Blog documentation here By default, MongoDB generates a unique ObjectID identifier that is assigned to the _id field in a new document before writing that document to the database. In many cases the default unique identifiers assigned by MongoDB will meet application requirements. However, in some cases an application may need to create custom unique identifiers, such as:
Use ObjectID as a unique identifier
MongoDB database drivers by default generate an ObjectID identifier that is assigned to the _id field of each document. In many cases the ObjectID may be used as a unique identifier in an application.
ObjectID is a 96-bit number which is composed as follows:
Benefits
Use a single counter document to generate unique identifiers one at a time
A counter document for unique identifier generation could look as follows:
And the unique identifier-generation document could be atomically requested and incremented as follows. Note that by default the document returned from findAndModify is the pre-modification document:
Use a single counter document that allocates batches of unique identifiers
This approach is similar to the previous approach, with the difference being that instead of incrementing the COUNT value by 1, we may wish to increment it by a larger number that will represent a batch of unique identifiers that will be allocated by the database to the application.
For example
, if the application knows that it needs 1000 new unique identifiers, then the application would use findAndModify() to atomically get the current COUNT and increment the COUNT value by 1000. The document returned from the findAndModify command would contain the starting value for the batch of unique identifiers, and the application would loop over 1000 values from that starting point.