Mongodb – Find and Insert Missing data in Mongodb Collection

mongodbpython

I want to write python3 code to check and insert missing data. My MongoDB collection documents have one field named "height" which is BTC block number. I want to traverse in a range start to the latest block number and check which number is missing from range. The number which is missing I want to insert that. Can somebody help me with the logic?
I have MongoDB version 4.

Best Answer

Assuming your numbers are sequential (which appears to be the case for BTC height values), you can use the $range aggregation expression to generate the expected sequence of values for comparison with the existing values.

Here's an example aggregation function using the mongo shell:

function findMissingValues(start, end) {
   var results = db.blocks.aggregate([

      // This stage would benefit from a relevant index like `height: 1`
      { $match: {
         height: {$gte: start, $lte: end}
      }},

      // Find existing height values in the given range
      { $group: {
          _id: null,
          heights: {$push: "$height"}
      }},

      // Add the expected range (inclusive of start and end values)
      { $addFields: {
          allHeights: { $range: [ start, end + 1]}
      }},

      // Find any expected values that are missing from the existing data
      { $project: {
          _id: 0,
          missing: { $setDifference: [ "$allHeights", "$heights" ] }
      }}

   ]).toArray();

   printjson(results[0]);
}

Sample usage:

> db.blocks.insert([ { height: 550000 }, { height: 550002 } ])

> findMissingValues(550000,550005)
{ "missing" : [ 550001, 550003, 550004, 550005 ] }

With the missing blocks identified, you can now retrieve and insert the relevant BTC data.