MongoDB Keys – Casting Between Types

datatypesindex-tuningmongodb

This is more of a theoretical question:

Let's say I have a collection of documents, my_collection:

{
    "_id": ObjectId("52c679ac7a23e6225b62c51f"),
    "date": ISODate("2014-01-06T03:48:30.737Z")
}
{
    "_id" : ObjectId("52c679b63cdecc6508c54fe5"),
    "date": "2014-01-06T03:48:30.737Z"
}
{
    "_id" : ObjectId("52ca15677a23e6d3354fb7b4"),
    "date": "2014-01-03 19:49:48"
}

and I then create an index

> db.my_collection.ensureIndex({date: 1})

Does Mongo automagically figure out it should be a date and then do a cast to date before it hashes for the index? Or does it treat the field as a string serialisation or something else?

What sort of performance impact would I get (if any) not having all the fields as Date BSON objects, or is there minimal to no impact?

EDIT: I think I failed to clearly communicate another part of the question, but your answer might have already addressed it. Dates stored as string will not be included in results where the query uses a Date()/IsoDate() and vice versa, right?

Best Answer

Since a query that would utilize the "date" index will implicitly contain a criteria for that "date" field (value(s) of certain type(s)), Mongo will decide what index/part of the index to use, making the fact of having multiple types of values in the same field moot (as far as indexing is concerned).

Here's an example:

I've added the following data to my Mongo 2.4.4 test db:

{ "_id" : ObjectId("52c679ac7a23e6225b62c51f"), "date" : ISODate("2014-01-06T03:48:30.737Z") }
{ "_id" : ObjectId("52c679b63cdecc6508c54fe5"), "date" : "2014-01-06T03:48:30.737Z" }
{ "_id" : ObjectId("52ca15677a23e6d3354fb7b4"), "date" : "2014-01-03 19:49:48" }
{ "_id" : ObjectId("42c679ac7a23e6225b62c51f"), "date" : ISODate("2014-02-06T03:48:30.737Z") }

Before index creation, any query on the date field (be it with a ISODate or a string criteria) will search through all the 4 documents, this can be confirmed by using explain():

> db.t1.find({date:ISODate("2014-02-06T03:48:30.737Z")}).explain();
{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 1,
        "nscannedObjects" : 4,
        "nscanned" : 4,
        "nscannedObjectsAllPlans" : 4,
        "nscannedAllPlans" : 4,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {

        },
        "server" : "P09104:27017"
}

> db.t1.find({date:"2014-01-03 19:49:48"}).explain();
{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 1,
        "nscannedObjects" : 4,
        "nscanned" : 4,
        "nscannedObjectsAllPlans" : 4,
        "nscannedAllPlans" : 4,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {

        },
        "server" : "P09104:27017"
}

Once I create the index via ensureIndex("date"), doing either a query of type date:"some string", date:ISODAte(...) or date in some range of ISODate values will only look through the pertinent values (2 for each of these queries):

> db.t1.find({date: {$gte: ISODate("2014-01-05T03:48:30.737Z"), $lt: ISODate("2014-02-07T03:48:30.737Z")}}).explain();
{
        "cursor" : "BtreeCursor date_1",
        "isMultiKey" : false,
        "n" : 2,
        "nscannedObjects" : 2,
        "nscanned" : 2,
        "nscannedObjectsAllPlans" : 2,
        "nscannedAllPlans" : 2,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "date" : [
                        [
                                ISODate("2014-01-05T03:48:30.737Z"),
                                ISODate("2014-02-07T03:48:30.737Z")
                        ]
                ]
        },
        "server" : "P09104:27017"
}

This is made more clear if we add even a 3rd type of value for date (numerical) and create something like this:

{ "_id" : ObjectId("52c679ac7a23e6225b62c51f"), "date" : ISODate("2014-01-06T03:48:30.737Z") }
{ "_id" : ObjectId("52c679b63cdecc6508c54fe5"), "date" : "2014-01-06T03:48:30.737Z" }
{ "_id" : ObjectId("52ca15677a23e6d3354fb7b4"), "date" : "2014-01-03 19:49:48" }
{ "_id" : ObjectId("42c679ac7a23e6225b62c51f"), "date" : ISODate("2014-02-06T03:48:30.737Z") }
{ "_id" : ObjectId("52ca15677a23e6d2354fb7b4"), "date" : 100 }
{ "_id" : ObjectId("52ca15677a23e6d2254fb7b4"), "date" : 400 }

Doing an $or seach for (let's say) ISODate and numerical values (exact match) and then doing an explain on that search will show that only 2 documents are scanned (out of the 6 indexed), showing that having mixed types of values for a field does not lower the efficiency of the index:

> db.t1.find( { $or: [ {date: ISODate("2014-01-06T03:48:30.737Z")}, {date: 100} ] } ).explain();
{
        "clauses" : [
                {
                        "cursor" : "BtreeCursor date_1",
                        "isMultiKey" : false,
                        "n" : 1,
                        "nscannedObjects" : 1,
                        "nscanned" : 1,
                        "nscannedObjectsAllPlans" : 1,
                        "nscannedAllPlans" : 1,
                        "scanAndOrder" : false,
                        "indexOnly" : false,
                        "nYields" : 0,
                        "nChunkSkips" : 0,
                        "millis" : 0,
                        "indexBounds" : {
                                "date" : [
                                        [
                                                ISODate("2014-01-06T03:48:30.737Z"),
                                                ISODate("2014-01-06T03:48:30.737Z")
                                        ]
                                ]
                        }
                },
                {
                        "cursor" : "BtreeCursor date_1",
                        "isMultiKey" : false,
                        "n" : 1,
                        "nscannedObjects" : 1,
                        "nscanned" : 1,
                        "nscannedObjectsAllPlans" : 1,
                        "nscannedAllPlans" : 1,
                        "scanAndOrder" : false,
                        "indexOnly" : false,
                        "nYields" : 0,
                        "nChunkSkips" : 0,
                        "millis" : 0,
                        "indexBounds" : {
                                "date" : [
                                        [
                                                100,
                                                100
                                        ]
                                ]
                        }
                }
        ],
        "n" : 2,
        "nscannedObjects" : 2,
        "nscanned" : 2,
        "nscannedObjectsAllPlans" : 2,
        "nscannedAllPlans" : 2,
        "millis" : 0,
        "server" : "P09104:27017"
}

Doing a similar $or search with ranges (for both ISODate and numerical) will show that 4 results are obtained after only scanning 4 documents (the string values to not influence the utilization of the index):

> db.t1.find( { $or: [ {date: {$gte: ISODate("2014-01-05T03:48:30.737Z"), $lt: ISODate("2014-02-07T03:48:30.737Z")}}, {date: {$gte: 50, $lt: 500}} ] } ).explain();
{
        "clauses" : [
                {
                        "cursor" : "BtreeCursor date_1",
                        "isMultiKey" : false,
                        "n" : 2,
                        "nscannedObjects" : 2,
                        "nscanned" : 2,
                        "nscannedObjectsAllPlans" : 2,
                        "nscannedAllPlans" : 2,
                        "scanAndOrder" : false,
                        "indexOnly" : false,
                        "nYields" : 0,
                        "nChunkSkips" : 0,
                        "millis" : 0,
                        "indexBounds" : {
                                "date" : [
                                        [
                                                ISODate("2014-01-05T03:48:30.737Z"),
                                                ISODate("2014-02-07T03:48:30.737Z")
                                        ]
                                ]
                        }
                },
                {
                        "cursor" : "BtreeCursor date_1",
                        "isMultiKey" : false,
                        "n" : 2,
                        "nscannedObjects" : 2,
                        "nscanned" : 2,
                        "nscannedObjectsAllPlans" : 2,
                        "nscannedAllPlans" : 2,
                        "scanAndOrder" : false,
                        "indexOnly" : false,
                        "nYields" : 0,
                        "nChunkSkips" : 0,
                        "millis" : 0,
                        "indexBounds" : {
                                "date" : [
                                        [
                                                50,
                                                500
                                        ]
                                ]
                        }
                }
        ],
        "n" : 4,
        "nscannedObjects" : 4,
        "nscanned" : 4,
        "nscannedObjectsAllPlans" : 4,
        "nscannedAllPlans" : 4,
        "millis" : 0,
        "server" : "P09104:27017"
}

Best Answer

Related Solutions

MySQL VARCHAR vs TEXT – Differences Between VARCHAR and TEXT Data Types

Postgresql – Using MongoDB and PostgreSQL together

Related Question