Mongodb – optimized query mongo find

mongodb

I want to fetch all data from a collection. Since running just find({}) crashes the mongo container (the liveness probe seams to fail because k8s reboot the container), I ask myself if using skip/limit will be better in order to fragment fetch into multiple parts.

Is that query optimized?

db.getCollection('collectionName').find({}).skip(300000).limit(200000)

Mongo logs gives: planSummary: COLLSCAN. Is that normal? Index on _id should be used, isn't it?

Here is some logs while request is performed:

2019-05-05T15:27:48.383+0000 I COMMAND [conn7] command off.TrunkOff command: getMore { getMore: 19694089771, collection: "TrunkOff", batchSize: 1000, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } originatingCommand: { find: "TrunkOff", filter: {}, skip: 350000, limit: 100000, returnKey: false, showRecordId: false, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } planSummary: COLLSCAN cursorid:19694089771 keysExamined:0 docsExamined:1000 numYields:15 nreturned:1000 reslen:10290608 locks:{ Global: { acquireCount: { r: 16 } }, Database: { acquireCount: { r: 16 } }, Collection: { acquireCount: { r: 16 } } } storage:{ data: { bytesRead: 10311365, timeReadingMicros: 300059 } } protocol:op_query 315ms
2019-05-05T15:28:21.281+0000 I COMMAND [conn6] command off.TrunkOff command: getMore { getMore: 19694089771, collection: "TrunkOff", batchSize: 1000, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } originatingCommand: { find: "TrunkOff", filter: {}, skip: 350000, limit: 100000, returnKey: false, showRecordId: false, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } planSummary: COLLSCAN cursorid:19694089771 keysExamined:0 docsExamined:1000 numYields:20 nreturned:1000 reslen:11985643 locks:{ Global: { acquireCount: { r: 21 } }, Database: { acquireCount: { r: 21 } }, Collection: { acquireCount: { r: 21 } } } storage:{ data: { bytesRead: 12016653, timeReadingMicros: 414440 } } protocol:op_query 427ms
2019-05-05T15:30:19.713+0000 I COMMAND [conn6] command off.TrunkOff command: getMore { getMore: 19694089771, collection: "TrunkOff", batchSize: 1000, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } originatingCommand: { find: "TrunkOff", filter: {}, skip: 350000, limit: 100000, returnKey: false, showRecordId: false, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } planSummary: COLLSCAN cursorid:19694089771 keysExamined:0 docsExamined:1000 numYields:13 nreturned:1000 reslen:9986640 locks:{ Global: { acquireCount: { r: 14 } }, Database: { acquireCount: { r: 14 } }, Collection: { acquireCount: { r: 14 } } } storage:{ data: { bytesRead: 10017227, timeReadingMicros: 270095 } } protocol:op_query 287ms
2019-05-05T15:30:44.023+0000 I COMMAND [conn7] command off.TrunkOff command: getMore { getMore: 19694089771, collection: "TrunkOff", batchSize: 1000, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } originatingCommand: { find: "TrunkOff", filter: {}, skip: 350000, limit: 100000, returnKey: false, showRecordId: false, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } planSummary: COLLSCAN cursorid:19694089771 keysExamined:0 docsExamined:1000 numYields:9 nreturned:1000 reslen:9228715 locks:{ Global: { acquireCount: { r: 10 } }, Database: { acquireCount: { r: 10 } }, Collection: { acquireCount: { r: 10 } } } storage:{ data: { bytesRead: 9247329, timeReadingMicros: 102849 } } protocol:op_query 110ms
2019-05-05T15:31:14.048+0000 I COMMAND [conn6] command off.TrunkOff command: getMore { getMore: 19694089771, collection: "TrunkOff", batchSize: 1000, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } originatingCommand: { find: "TrunkOff", filter: {}, skip: 350000, limit: 100000, returnKey: false, showRecordId: false, lsid: { id: UUID("02770ee3-0b54-4b38-9b8e-23063b514434") }, $db: "off" } planSummary: COLLSCAN cursorid:19694089771 keysExamined:0 docsExamined:1000 numYields:21 nreturned:1000 reslen:11922134 locks:{ Global: { acquireCount: { r: 22 } }, Database: { acquireCount: { r: 22 } }, Collection: { acquireCount: { r: 22 } } } storage:{ data: { bytesRead: 11958959, timeReadingMicros: 410002 } } protocol:op_query 428ms

What is a good way to traverse a collection from end to end?

Here is cpu/memory usage.
cpu goes max
Virtual memory is Gb more than mapped memory

Virtual memory is gbs upper than mapped memory. cacheSizeGb is 1.

Best Answer

As per MongoDB documentation here For commonly issued queries, create indexes. If a query searches multiple fields, create a compound index. Scanning an index is much faster than scanning a collection. The indexes structures are smaller than the documents reference, and store references in order.

For Example :

If you have a posts collection containing blog posts, and if you regularly issue a query that sorts on the author_name field, then you can optimize the query by creating an index on the author_name field:

db.posts.createIndex( { author_name : 1 } )

Indexes also improve efficiency on queries that routinely sort on a given field.

If you regularly issue a query that sorts on the timestamp field, then you can optimize the query by creating an index on the timestamp field:

Creating this index:

db.posts.createIndex( { timestamp : 1 } )

Optimizes this query:

db.posts.find().sort( { timestamp : -1 } )

Because MongoDB can read indexes in both ascending and descending order, the direction of a single-key index does not matter.

Where Indexes support queries, update operations, and some phases of the aggregation pipeline.

Use Projections to Return Only Necessary Data

When you need only a subset of fields from documents, you can achieve better performance by returning only the fields you need:

For example, if in your query to the posts collection, you need only the timestamp, title, author, and abstract fields, you would issue the following command:

db.posts.find( {}, { timestamp : 1 , title : 1 , author : 1 , abstract : 1} ).sort( { timestamp : -1 } )

If you want to Limit the Number of Documents to Return or to Set the Starting Point of the Result Set then use limit or skip method.

Limit the Number of Documents to Return

The limit() method limits the number of documents in the result set. The following operation returns at most 5 documents in the bios collection:

db.bios.find().limit( 5 )

Set the Starting Point of the Result Set

The skip() method controls the starting point of the results set. The following operation skips the first 5 documents in the bios collection and returns all remaining documents:

db.bios.find().skip( 5 )

For more information on using projections, see Project Fields to Return from Query.