Mongodb – How indexing on a date field that is constantly updated and or omitted affect performance

indexmongodb

In MongoDB, I have a collection with a date field called last_updated I would like to index. Reason being, I am often querying on a date range of that field in addition to sorting by that field and feel as though I can speed the process up. I would like to add as a side note: this collection does see a lot of writes and updates.

However, I would like to know the implications of doing so. My understanding is that updates are the heaviest transaction in a b-tree index structure and would poorly affect performance. Would anyone mind explaining whether it would or wouldn't and if not why it would be just as normal as indexing on another date field (one that is not updated).

Best Answer

Hi It will affect but not greatly.what happens is whenever you made change in collection index will be updated and during index update your collection will be locked .So to overcome this you can use index in background by using: db.collection.createIndex( { last_updated: 1}, {background: true} )

it will let you use your collection while updating so you will never found any kind of disruption because of your index.

As you have mentioned that you do sorting on "last_updated".I strongly recommend you to use index .you will have high read performance at a minimal indexing cost.

Related Solutions

Sql-server – Do I need separate indexes for each type of query, or will one multi-column index work

You are right in that your example query would not use that index.

The query planner will consider using an index if:

all the fields contained in it are referenced in the query
some of the fields starting from the beginning are referenced

It will not be able to make use of indexes that start with a field not used by the query.

So for your example:

SELECT [id], [name], [customerId], [dateCreated]
   FROM Representatives WHERE customerId=1 
   ORDER BY dateCreated

it would consider indexes such as:

[customerId]
[customerId], [dateCreated]
[customerId], [dateCreated], [name]

but not:

[name], [customerId], [dateCreated]

If it found both [customerId] and [customerId], [dateCreated], [name] its decision to prefer one over the other would depend on the index stats which depend on estimates of the balance of data in the fields. If [customerId], [dateCreated] were defined it should prefer that over the other two unless you give a specific index hint to the contrary.

It is not uncommon to see one index defined for every field in my experience either, though this is rarely optimal as the extra management needed to update the indexes on insert/update, and the extra space needed to store them, is wasted when half of them may never get used - but unless your DB sees write-heavy loads the performance is not going to stink badly even with the excess indexes.

Specific indexes for frequent queries that would otherwise be slow due to table or index scanning is generally a good idea, though don't overdo it as you could be exchanging one performance issue for another. If you do define [customerId], [dateCreated] as an index, for example, remember that the query planner will be able to use that for queries that would use an index on just [customerId] if present. While using just [customerId] would be slightly more efficient than using the compound index this may be mitigated by ending up having two indexes competing for space in RAM instead of one (though if your entire normal working set fits easily into RAM this extra memory competition may not be an issue).

MongoDB: Should I combine fields for performance in limited circumstances

Without combining the fields in your data set you could just create a compound index using both. No changes to your data necessary. To do so, just create the index like so:

db.collection.ensureIndex({"user" : 1, "domain" : 1})

Docs are here:

http://www.mongodb.org/display/DOCS/Indexes#Indexes-CompoundKeys

Once you have created such a compound key it essentially makes an index on the leftmost element (user in my example above) redundant, and so an index on user (if it exists) could be removed.

Don't forget that the query optimizer only runs every ~1000 queries, so you will have to hint() the index to make sure it is used if you are testing it out.

Best Answer

Related Solutions

Sql-server – Do I need separate indexes for each type of query, or will one multi-column index work

MongoDB: Should I combine fields for performance in limited circumstances

Related Question