I have a mongodb query that works but takes too long to execute and causes my CPU to spike to 100% while it is executing. It is this query here:
db.logs.aggregate([
{
$lookup:
{
from: 'graphs',
let: { logId : '$_id' },
as: 'matched_docs',
pipeline:
[
{
$match: {
$expr: {
$and: [
{ $eq: ['$$logId', '$lId'] },
{ $gte: [ '$d', new Date('2020-12-21T00:00:00.000Z') ] },
{ $lt: [ '$d', new Date('2020-12-23T00:00:00.000Z') ] }
]
}
}
}
],
}
},
{
$match: {
$expr: {
$and: [
{ $eq: [ '$matched_docs', [] ] },
{ $gte: [ '$createDate', new Date('2020-12-21T00:00:00.000Z') ] },
{ $lt: [ '$createDate', new Date('2020-12-23T00:00:00.000Z') ] }
]
}
}
},
{ $limit: 5 }
]);
This query looks for all records in the db.logs
collection for which they have not been transformed and loaded into db.graphs
. It's analogous to this SQL approach:
WHERE db.logs._id NOT IN (
SELECT lId FROM db.graphs
WHERE db.graphs.d >= @startTime
AND db.graphs.d < @endTime
)
AND db.logs.createDate >= @startTime
AND db.logs.createDate < @endTime
)
The db.logs
has over 1 Million records and here are the indexes:
db.logs.getIndexes();
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_"
},
{
"v" : 2,
"key" : {
"createDate" : 1
},
"name" : "createDate_1"
}
]
And db.reportgraphs
has fewer than 100 records with indexes on every property/column.
In my attempt to analyze why the mongo query is so slow and CPU intensive, I suffixed my mongo query with a .explain()
. But mongo gave me the error saying db.logs.aggregate(...).explain() is not a function
. I also tried adding , {$explain: 1}
immediately after { $limit: 5}
and got an error saying Unrecognized pipeline stage name $explain
.
So I guess I have two questions:
- Can someone give feedback on why my mongo query is so slow or possible solutions?
- Is there a way to see the execution plan of my mongo query so that I can review where the performance bottle necks are?
UPDATE
A possible solution I'm considering is to have a property db.logs.isGraphed:boolean
. Then use a simple db.logs.find({isGraphed:false, createDate:{...date filter...}}).limit(5)
. Wasn't sure if this is the approach most people would have considered in the first place?
Best Answer
It is slow because it is not using an index. For each document in the
logs
collection, it is doing a full collection scan on thegraphs
collection.From the $expr documentation page: