Mongodb – Use index intersection feature in MongoDB 2.6 as compound shard key

indexmongodbsharding

I have three indexes on name, age, salary.

I can use them as compound index on MongoDB 2.6 (name and age together), but can I use the same indexes as compound shard key?

I also have a constraint that this will be a unique shard key.

Will it work?

Best Answer

In order to deploy a shard key in your case (name, age) you need to pre-create an index on (name, age).

The shard key can be unique, but I don't see how a unique constraint fits here, for example isn't it quite likely to have two same records? (name:Mary age:26?).

You cannot enforce a unique constraint directly to the database without a compound index.

Index intersection is to combine two index on the run-time. For example, you have an index on name and one on age and a query {name='somename' and age='someage'}. Prior to 2.6, you needed a compound index on (name, age). On 2.6, the optimizer can use these individual indexes to execute the above query.

Related Solutions

Mongodb – Choosing shard key and friendly URL Ids for the MongoDB

Before you go further you need to answer a few questions

how do you represent files within folders into the database
how do you represent folders
do you have relations between folders (parent -> child)
how often do you expect to create folders and files
how often do you update existing files into folders and what is the number of files you update

Based on your answers you can have a write optimized schema or a read optimized schema. Write optimized is a schema that contains many entries that are very small or you can use built in operators like $inc over a collection. Read optimized is generally a larger collection like the one you described, into your scenario you could have very easy something like this (assuming all folders are at the same level)

{ "userid" : "email or id",
  [ 
     { "folder1" : [ "file1", "file2"] },  
     { "folder2" : [ "file3", "file4"] },
  ]
}

But with this schema it gets quite complicated if you need to link a folder to a parent folder ... But is obvious that the userid is the shard key.

Mysql – Is this compound index unnecessary

It might be worth posting the table definition from your other question for clarity.

The composite index is doing a few things for you:

As you know, enforcing uniqueness on (sensor_id, timestamp); I'm unsure whether this is an important data integrity constraint.
Allowing queries that filter on both columns to look up matching rows by using a single index. MySQL can answer some queries (equality conditions on multiple columns are the ones I know about) by merging two indexes, but this tends to be significantly slower compared to using a single composite index.
The index can also be used to search for values in a left-based subset of the composite index, but not a right-based subset. So in this case it could help a query that filters on sensor_id values or sensor_id and timestamp values, but not timestamp values alone.

There are a number of caveats to this, so it's good idea to look at the EXPLAIN output for your queries and verify what indexes they're using. Keep also in mind that indexes can support the read part of UPDATE and DELETE queries, as well as JOINs, GROUP BY, ORDER BY, and other operations I'm neglecting.

An example of a scenario where the composite index is unnecessary would be if you don't care about the uniqueness constraint and all your queries filter on timestamp or sensor_id, but not both.

The single-column index on sensor_id is actually redundant since the composite index on (sensor_id, timestamp) can be used by the same queries, but still you might find that some queries perform faster when doing scans on the single-column index compared to using a composite index with a wider key. The difference might not be enough to matter, though, and some testing will probably be required to find out.

In addition to looking at the EXPLAIN output for your queries, tools such as pt-index-usage from the Percona toolkit or the table INFORMATION_SCHEMA.INDEX_STATISTICS if you're running Percona Server or MariaDB can help you assess what indexes are actually being used.

References:

The Optimization That (Often) Isn’t: Index Merge Intersection

Practical MySQL indexing guidelines

Best Answer

Related Solutions

Mongodb – Choosing shard key and friendly URL Ids for the MongoDB

Mysql – Is this compound index unnecessary

Related Question