Postgresql – MongoDB aggregate performance compared to Postgres

aggregatemongodbperformancepostgresql

I have collection with about 700,000 documents. These documents represent locations in a hex grid. There are only 500 different locations (ll_16k). The aggregation of the 700,000 documents into 500 documents takes about 2 seconds.

  { $group:
    { _id: "$properties.ll_16k",
      total:
        { $sum: "$properties.price" },
      count:
        { $sum: 1 }
    }
  }

The same action in Postgres is flying… This query performs routinely under 500ms.

SELECT
  ll_16k,
  count(ll_16k),
  sum(price)
FROM ppd_2015_perf
GROUP BY ll_16k;

I am just getting started with MongoDB. Am I doing something wrong here or is this the expected performance difference?

This is a test subset. The actual dataset will be 20,000,000 documents/records in size and expected to grow about 1,000,000 records per year.

Best Answer

One difference to consider is the amount of IO happening. For Postgres data is normalised into tables and tables are read one page at at time. To complete the SQL it need only read however much disk is required to store all of table ppd_2015_perf.

Mongo's unit of IO, however, is a whole document. All the denormalised/ duplicated stuff comes in along with the three fields in which you're interested.

Of course there a lot of things assumed here - RAM, compression, disk performance and data models being some.

Related Solutions

Postgresql – postgres materialized views and indexes to improve aggregate performance

No. How would the query processor recognize that it should use the materialized view. The materialized view is another object.

See http://www.postgresql.org/docs/9.4/static/rules-materializedviews.html

In part that explains: "When a materialized view is referenced in a query, the data is returned directly from the materialized view."

So materialized views are useful, but you need to program not only for their maintenance, but also doing the necessary coding to use the materialized view.

MongoDB Insert Performance

You want to use bulk write operations instead of individual updates and inserts.

// BEFORE your iteration loop
var bulk = db.another_collection.initializeUnorderedBulkOp();

// INSIDE your iteration loop
// instead of db.another_collection.update(…)
bulk.find(query).update(yourUpdateDocumentHere);

// instead of db.another_collection.insert(…)
bulk.insert(yourDocumentToInsertHere);

// AFTER your iteration loop
bulk.execute()

This should speed up the write operations dramatically.

EDIT: as for the bottleneck, extending arrays is a costly operation, since it forces MongoDB to relocate documents often, since the padding will be exceeded. As a rule of thumb: if you have an array which you have to extend often, there is something wrong with your data model. More often than not, it is worth to revise it.

Best Answer

Related Solutions

Postgresql – postgres materialized views and indexes to improve aggregate performance

MongoDB Insert Performance

Related Question