I'm looking to reshape the Documents in one of my collections, and have found two ways to do it, but need guidance. For simplicity, say I have a collection, "myColl", and I need to reshape Documents that look like this:
{
x:"foo",
y:"bar"
}
To:
{
nest: {
x: "foo",
y: "bar"
}
}
This can be accomplished by using the aggregation framework to reshape the documents, and then rewrite the entire collection. When run against a test collection of about 150K records, the following takes roughly 5 seconds:
db.myColl.aggregate([{$project: {_id:"$id", nest: {x:$x, y:$y}}}, {$out:"myColl"}]);
If I try to do this using a cursor, it takes about a 1.5 minutes:
db.myColl.find().snapshot().forEach(
function(elem) {
db.myColl.update(
{_id: elem._id},
{$set: {nest: {x: elem.x, y: elem.y}}}
);
}
);
I'm leaning towards the aggregation approach for performance reasons; however, someone mentioned here that it is creating a "new" collection with somewhat of a negative connotation, but it's not entirely apparent as to why. Are there causes of concern that I should be aware of other than the Type safety mentioned in that comment?
Also, if the cursor approach is better, then how might I speed up the execution? Setting the "w" param of WriteConcern to 0 doesn't do anything in my test because everything is hosted on the same box so skipping the acknowledgement doesn't save me any time, and is orthogonal to the fact that aggregation is executing order of magnitudes faster.
Thanks for the input!
Best Answer
Aggregation vs Cursor
Let's first start from
Aggregation
. As per MongoDB BOL Here Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. The aggregation pipeline can use indexes to improve its performance during some of itsstages
. In addition, the aggregation pipeline has an internal optimization phase.The most basic pipeline stages provide filters that operate like queries and document transformations that modify the form of the output document.
For example here i want to show the aggregation where
MongoDB
provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single purpose aggregation methods.Let's here i am going to create orders
collection
ofMongoDB
with4
documents
.To verify the inserted documents from MongoDB
Aggregation Pipeline
MongoDB’s aggregation framework is modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result.
Map-Reduce
MongoDB also provides map-reduce operations to perform aggregation. In general, map-reduce operations have two phases: a map stage that processes each document and emits one or more objects for each input document, and reduce phase that combines the output of the map operation. Optionally, map-reduce can have a finalize stage to make final modifications to the result. Like other aggregation operations, map-reduce can specify a query condition to select the input documents as well as sort and limit the results.
Note: Starting in
MongoDB 2.4
, certainmongo
shell functions and properties are inaccessible in map-reduce operations.MongoDB 2.4
also provides support for multiple JavaScript operations to run at the same time. BeforeMongoDB 2.4
, JavaScript code executed in a single thread, raising concurrency issues formap-reduce
.Single Purpose Aggregation Operations
MongoDB also provides db.collection.count() and db.collection.distinct().
All of these operations aggregate documents from a single collection. While these operations provide simple access to common aggregation processes, they lack the flexibility and capabilities of the
aggregation pipeline
andmap-reduce
.Cursor
As MongoDB BOL Iterate a Cursor in the mongo Shell The db.collection.find() method returns a cursor. To access the documents, you need to iterate the cursor. However, in the
mongo shell
, if the returned cursor is not assigned to a variable using the var keyword, then the cursor is automatically iterated up to 20 times to print up to the first 20 documents in the results.The following examples describe ways to manually iterate the cursor to access the documents or to use the
iterator index
.Manually Iterate the Cursor
You can use the cursor method forEach() to iterate the cursor and access the documents, as in the following example:
Iterator Index
In the mongo shell, you can use the toArray() method to iterate the cursor and return the documents in an array, as in the following:
The toArray() method loads into RAM all documents returned by the cursor; the toArray() method exhausts
the cursor
.Cursor Behaviors
Closure of Inactive Cursors
by default, the server will automatically close the cursor after 10 minutes of inactivity, or if client has exhausted the cursor. To override this behavior in themongo
shell, you can use the cursor.noCursorTimeout() method:After setting the noCursorTimeout option, you must either close the cursor manually with cursor.close() or by
exhausting
the cursor’s results.To know Cursor Information from MongoDB Server
The db.serverStatus() method returns a document that includes a metrics field.
The result is the following document:
As finally, In
Aggregation operations
group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. The pipeline provides efficient data aggregation using native operations withinMongoDB
, and is the preferred method for data aggregation inMongoDB
.where in the mongo shell, if the returned
cursor
is not assigned to a variable using the var keyword, then the cursor is automatically iterated up to 20 times to print up to the first 20 documents in the results. ThetoArray()
method loads into RAM all documents returned by the cursor; thetoArray()
method exhausts the cursor.