Before you go further you need to answer a few questions
- how do you represent files within folders into the database
- how do you represent folders
- do you have relations between folders (parent -> child)
- how often do you expect to create folders and files
- how often do you update existing files into folders and what is the number of files you update
Based on your answers you can have a write optimized schema or a read optimized schema. Write optimized is a schema that contains many entries that are very small
or you can use built in operators like $inc over a collection. Read optimized is generally a larger collection like the one you described, into your scenario you could have very easy something like this (assuming all folders are at the same level)
{ "userid" : "email or id",
[
{ "folder1" : [ "file1", "file2"] },
{ "folder2" : [ "file3", "file4"] },
]
}
But with this schema it gets quite complicated if you need to link a folder to a parent folder ... But is obvious that the userid is the shard key.
It might be worth posting the table definition from your other question for clarity.
The composite index is doing a few things for you:
- As you know, enforcing uniqueness on (sensor_id, timestamp); I'm unsure whether this is an important data integrity constraint.
- Allowing queries that filter on both columns to look up matching rows by using a single index. MySQL can answer some queries (equality conditions on multiple columns are the ones I know about) by merging two indexes, but this tends to be significantly slower compared to using a single composite index.
- The index can also be used to search for values in a left-based subset of the composite index, but not a right-based subset. So in this case it could help a query that filters on sensor_id values or sensor_id and timestamp values, but not timestamp values alone.
There are a number of caveats to this, so it's good idea to look at the EXPLAIN
output for your queries and verify what indexes they're using. Keep also in mind that indexes can support the read part of UPDATE and DELETE queries, as well as JOINs, GROUP BY, ORDER BY, and other operations I'm neglecting.
An example of a scenario where the composite index is unnecessary would be if you don't care about the uniqueness constraint and all your queries filter on timestamp or sensor_id, but not both.
The single-column index on sensor_id
is actually redundant since the composite index on (sensor_id
, timestamp
) can be used by the same queries, but still you might find that some queries perform faster when doing scans on the single-column index compared to using a composite index with a wider key. The difference might not be enough to matter, though, and some testing will probably be required to find out.
In addition to looking at the EXPLAIN
output for your queries, tools such as pt-index-usage
from the Percona toolkit or the table INFORMATION_SCHEMA.INDEX_STATISTICS
if you're running Percona Server or MariaDB can help you assess what indexes are actually being used.
References:
The Optimization That (Often) Isn’t: Index Merge Intersection
Practical MySQL indexing guidelines
Best Answer
In order to deploy a shard key in your case (name, age) you need to pre-create an index on (name, age).
The shard key can be unique, but I don't see how a unique constraint fits here, for example isn't it quite likely to have two same records? (name:Mary age:26?).
You cannot enforce a unique constraint directly to the database without a compound index.
Index intersection is to combine two index on the run-time. For example, you have an index on name and one on age and a query
{name='somename' and age='someage'}
. Prior to 2.6, you needed a compound index on (name, age). On 2.6, the optimizer can use these individual indexes to execute the above query.