Mongodb – optimized query mongo find

mongodb

I want to fetch all data from a collection. Since running just find({}) crashes the mongo container (the liveness probe seams to fail because k8s reboot the container), I ask myself if using skip/limit will be better in order to fragment fetch into multiple parts.

Is that query optimized?

db.getCollection('collectionName').find({}).skip(300000).limit(200000)

Mongo logs gives: planSummary: COLLSCAN. Is that normal? Index on _id should be used, isn't it?

Here is some logs while request is performed:

2019-05-05T15:27:48.383+0000 I COMMAND [conn7] command off.TrunkOff command: getMore { getMore: 19694089771, collection: "TrunkOff", batchSize: 1000, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } originatingCommand: { find: "TrunkOff", filter: {}, skip: 350000, limit: 100000, returnKey: false, showRecordId: false, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } planSummary: COLLSCAN cursorid:19694089771 keysExamined:0 docsExamined:1000 numYields:15 nreturned:1000 reslen:10290608 locks:{ Global: { acquireCount: { r: 16 } }, Database: { acquireCount: { r: 16 } }, Collection: { acquireCount: { r: 16 } } } storage:{ data: { bytesRead: 10311365, timeReadingMicros: 300059 } } protocol:op_query 315ms
2019-05-05T15:28:21.281+0000 I COMMAND [conn6] command off.TrunkOff command: getMore { getMore: 19694089771, collection: "TrunkOff", batchSize: 1000, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } originatingCommand: { find: "TrunkOff", filter: {}, skip: 350000, limit: 100000, returnKey: false, showRecordId: false, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } planSummary: COLLSCAN cursorid:19694089771 keysExamined:0 docsExamined:1000 numYields:20 nreturned:1000 reslen:11985643 locks:{ Global: { acquireCount: { r: 21 } }, Database: { acquireCount: { r: 21 } }, Collection: { acquireCount: { r: 21 } } } storage:{ data: { bytesRead: 12016653, timeReadingMicros: 414440 } } protocol:op_query 427ms
2019-05-05T15:30:19.713+0000 I COMMAND [conn6] command off.TrunkOff command: getMore { getMore: 19694089771, collection: "TrunkOff", batchSize: 1000, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } originatingCommand: { find: "TrunkOff", filter: {}, skip: 350000, limit: 100000, returnKey: false, showRecordId: false, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } planSummary: COLLSCAN cursorid:19694089771 keysExamined:0 docsExamined:1000 numYields:13 nreturned:1000 reslen:9986640 locks:{ Global: { acquireCount: { r: 14 } }, Database: { acquireCount: { r: 14 } }, Collection: { acquireCount: { r: 14 } } } storage:{ data: { bytesRead: 10017227, timeReadingMicros: 270095 } } protocol:op_query 287ms
2019-05-05T15:30:44.023+0000 I COMMAND [conn7] command off.TrunkOff command: getMore { getMore: 19694089771, collection: "TrunkOff", batchSize: 1000, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } originatingCommand: { find: "TrunkOff", filter: {}, skip: 350000, limit: 100000, returnKey: false, showRecordId: false, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } planSummary: COLLSCAN cursorid:19694089771 keysExamined:0 docsExamined:1000 numYields:9 nreturned:1000 reslen:9228715 locks:{ Global: { acquireCount: { r: 10 } }, Database: { acquireCount: { r: 10 } }, Collection: { acquireCount: { r: 10 } } } storage:{ data: { bytesRead: 9247329, timeReadingMicros: 102849 } } protocol:op_query 110ms
2019-05-05T15:31:14.048+0000 I COMMAND [conn6] command off.TrunkOff command: getMore { getMore: 19694089771, collection: "TrunkOff", batchSize: 1000, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } originatingCommand: { find: "TrunkOff", filter: {}, skip: 350000, limit: 100000, returnKey: false, showRecordId: false, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } planSummary: COLLSCAN cursorid:19694089771 keysExamined:0 docsExamined:1000 numYields:21 nreturned:1000 reslen:11922134 locks:{ Global: { acquireCount: { r: 22 } }, Database: { acquireCount: { r: 22 } }, Collection: { acquireCount: { r: 22 } } } storage:{ data: { bytesRead: 11958959, timeReadingMicros: 410002 } } protocol:op_query 428ms

What is a good way to traverse a collection from end to end?

Here is cpu/memory usage.

Virtual memory is gbs upper than mapped memory. cacheSizeGb is 1.

Best Answer

As per MongoDB documentation here For commonly issued queries, create indexes. If a query searches multiple fields, create a compound index. Scanning an index is much faster than scanning a collection. The indexes structures are smaller than the documents reference, and store references in order.

For Example :

If you have a posts collection containing blog posts, and if you regularly issue a query that sorts on the author_name field, then you can optimize the query by creating an index on the author_name field:

db.posts.createIndex( { author_name : 1 } )

Indexes also improve efficiency on queries that routinely sort on a given field.

If you regularly issue a query that sorts on the timestamp field, then you can optimize the query by creating an index on the timestamp field:

Creating this index:

db.posts.createIndex( { timestamp : 1 } )

Optimizes this query:

db.posts.find().sort( { timestamp : -1 } )

Because MongoDB can read indexes in both ascending and descending order, the direction of a single-key index does not matter.

Where Indexes support queries, update operations, and some phases of the aggregation pipeline.

Use Projections to Return Only Necessary Data

When you need only a subset of fields from documents, you can achieve better performance by returning only the fields you need:

For example, if in your query to the posts collection, you need only the timestamp, title, author, and abstract fields, you would issue the following command:

db.posts.find( {}, { timestamp : 1 , title : 1 , author : 1 , abstract : 1} ).sort( { timestamp : -1 } )

If you want to Limit the Number of Documents to Return or to Set the Starting Point of the Result Set then use limit or skip method.

Limit the Number of Documents to Return

The limit() method limits the number of documents in the result set. The following operation returns at most 5 documents in the bios collection:

db.bios.find().limit( 5 )

Set the Starting Point of the Result Set

The skip() method controls the starting point of the results set. The following operation skips the first 5 documents in the bios collection and returns all remaining documents:

db.bios.find().skip( 5 )

For more information on using projections, see Project Fields to Return from Query.

Related Solutions

Mongodb – Mongo cannot find the collection

I wonder what "test" is. There is no database named test there.

This is just the default database that the mongo shell uses when it connects, unless you insert something, it is empty. You can do this with any database in fact, test is just the default.

The first command will not work because you did not prefix it with db, as such it attempts to find "isikota" as a variable (javascript) and fails to find it. Similarly your next attempt does the same with a non-existent "test" variable.

Finally, this command is the right form (prefixed with db):

db.isikota.tablebusiness.ensureIndex({"LatitudeLongitude" : "2d"})

But, you did not change database (with use <database name>) or at least you did not mention trying to at least. What this is going to do is create a 2d (geoindex) index on a "LatitudeLongitude" field on a collection named "isikota.tablebusiness" in the test database (db always refers to the database you are currently using). Even if this collection does not exist, the index will be created (this is just an insert into the system.indexes namespace) and hence the command succeeds. You just won't see anything else, because I believe you are still operating on the test database.

Assuming your data is actually in a database called "isikota" and that "tablebusiness" is your collection, what you actually want to do is this:

use isikota;
db.tablebusiness.ensureIndex({"LatitudeLongitude" : "2d"});

That will create the index on the "LatitudeLongitude" field in a collection named "tablebusiness" in a database named "isikota".

Mongodb – How to find mongo document by ObjectID age

The general approach was to fabricate an ObjectID with the time stamp prefix of what I wanted. The general query looked like

db.mycol.find({_id: {$gte:ObjectId("50b7feb00000000000000000")} }, {fieldIWant:1})

The corresponding mongoexport cmd for a json output file was

 mongoexport -d mydb -c mycol -q '{_id: {$gte:ObjectId("50b7feb00000000000000000")} }' -f 'fieldIWant' > myFile.json

Best Answer

Related Solutions

Mongodb – Mongo cannot find the collection

Mongodb – How to find mongo document by ObjectID age

Related Question