Since a query that would utilize the "date" index will implicitly contain a criteria for that "date" field (value(s) of certain type(s)), Mongo will decide what index/part of the index to use, making the fact of having multiple types of values in the same field moot (as far as indexing is concerned).
Here's an example:
I've added the following data to my Mongo 2.4.4 test db:
{ "_id" : ObjectId("52c679ac7a23e6225b62c51f"), "date" : ISODate("2014-01-06T03:48:30.737Z") }
{ "_id" : ObjectId("52c679b63cdecc6508c54fe5"), "date" : "2014-01-06T03:48:30.737Z" }
{ "_id" : ObjectId("52ca15677a23e6d3354fb7b4"), "date" : "2014-01-03 19:49:48" }
{ "_id" : ObjectId("42c679ac7a23e6225b62c51f"), "date" : ISODate("2014-02-06T03:48:30.737Z") }
Before index creation, any query on the date field (be it with a ISODate or a string criteria) will search through all the 4 documents, this can be confirmed by using explain()
:
> db.t1.find({date:ISODate("2014-02-06T03:48:30.737Z")}).explain();
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 4,
"nscanned" : 4,
"nscannedObjectsAllPlans" : 4,
"nscannedAllPlans" : 4,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
},
"server" : "P09104:27017"
}
> db.t1.find({date:"2014-01-03 19:49:48"}).explain();
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 4,
"nscanned" : 4,
"nscannedObjectsAllPlans" : 4,
"nscannedAllPlans" : 4,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
},
"server" : "P09104:27017"
}
Once I create the index via ensureIndex("date")
, doing either a query of type date:"some string", date:ISODAte(...) or date in some range of ISODate values will only look through the pertinent values (2 for each of these queries):
> db.t1.find({date: {$gte: ISODate("2014-01-05T03:48:30.737Z"), $lt: ISODate("2014-02-07T03:48:30.737Z")}}).explain();
{
"cursor" : "BtreeCursor date_1",
"isMultiKey" : false,
"n" : 2,
"nscannedObjects" : 2,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"date" : [
[
ISODate("2014-01-05T03:48:30.737Z"),
ISODate("2014-02-07T03:48:30.737Z")
]
]
},
"server" : "P09104:27017"
}
This is made more clear if we add even a 3rd type of value for date (numerical) and create something like this:
{ "_id" : ObjectId("52c679ac7a23e6225b62c51f"), "date" : ISODate("2014-01-06T03:48:30.737Z") }
{ "_id" : ObjectId("52c679b63cdecc6508c54fe5"), "date" : "2014-01-06T03:48:30.737Z" }
{ "_id" : ObjectId("52ca15677a23e6d3354fb7b4"), "date" : "2014-01-03 19:49:48" }
{ "_id" : ObjectId("42c679ac7a23e6225b62c51f"), "date" : ISODate("2014-02-06T03:48:30.737Z") }
{ "_id" : ObjectId("52ca15677a23e6d2354fb7b4"), "date" : 100 }
{ "_id" : ObjectId("52ca15677a23e6d2254fb7b4"), "date" : 400 }
Doing an $or seach for (let's say) ISODate and numerical values (exact match) and then doing an explain on that search will show that only 2 documents are scanned (out of the 6 indexed), showing that having mixed types of values for a field does not lower the efficiency of the index:
> db.t1.find( { $or: [ {date: ISODate("2014-01-06T03:48:30.737Z")}, {date: 100} ] } ).explain();
{
"clauses" : [
{
"cursor" : "BtreeCursor date_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"date" : [
[
ISODate("2014-01-06T03:48:30.737Z"),
ISODate("2014-01-06T03:48:30.737Z")
]
]
}
},
{
"cursor" : "BtreeCursor date_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"date" : [
[
100,
100
]
]
}
}
],
"n" : 2,
"nscannedObjects" : 2,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"millis" : 0,
"server" : "P09104:27017"
}
Doing a similar $or search with ranges (for both ISODate and numerical) will show that 4 results are obtained after only scanning 4 documents (the string values to not influence the utilization of the index):
> db.t1.find( { $or: [ {date: {$gte: ISODate("2014-01-05T03:48:30.737Z"), $lt: ISODate("2014-02-07T03:48:30.737Z")}}, {date: {$gte: 50, $lt: 500}} ] } ).explain();
{
"clauses" : [
{
"cursor" : "BtreeCursor date_1",
"isMultiKey" : false,
"n" : 2,
"nscannedObjects" : 2,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"date" : [
[
ISODate("2014-01-05T03:48:30.737Z"),
ISODate("2014-02-07T03:48:30.737Z")
]
]
}
},
{
"cursor" : "BtreeCursor date_1",
"isMultiKey" : false,
"n" : 2,
"nscannedObjects" : 2,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"date" : [
[
50,
500
]
]
}
}
],
"n" : 4,
"nscannedObjects" : 4,
"nscanned" : 4,
"nscannedObjectsAllPlans" : 4,
"nscannedAllPlans" : 4,
"millis" : 0,
"server" : "P09104:27017"
}
I think your first idea is better. The second way, as you state, is how you would model the data in an RDBMS. If you're going to use MongoDB for fun, you might as well explore the fact that it has a different data model, and structure your collections accordingly. While I'm sure performance is not going to be an issue for the scale of this project, keeping the data in a single document avoids joins, which can be expensive. It's commonly held that, within MongoDB environments, denormalization is faster (see here).
In the schema describe in your first idea, playlists could all be subdocuments in a kind of catch-all artist document called "various". I imagine the same would be done in a relational implementation, with the table artist
having a record for "various", to cover compilation albums and soundtracks, for example.
I have a JSFiddle here with a JSON representation of what a music collection might look like implemented as your first idea.
Of course, there is yet another way, and that is a collection albums
, with most documents having a key artist
, which could either be a reference to the _id from a document in a collection artists
, or just have the name of the artist itself. In this implementation, playlists wouldn't have this key; or, alternatively, it could be an array with all the artists with tracks in the playlist.
Best Answer
Import your user data (.bson) to other temporary collection and then use forEach to update documents created_at keys.