Postgresql – MongoDB aggregate performance compared to Postgres

aggregatemongodbperformancepostgresql

I have collection with about 700,000 documents. These documents represent locations in a hex grid. There are only 500 different locations (ll_16k). The aggregation of the 700,000 documents into 500 documents takes about 2 seconds.

  { $group:
    { _id: "$properties.ll_16k",
      total:
        { $sum: "$properties.price" },
      count:
        { $sum: 1 }
    }
  }

The same action in Postgres is flying… This query performs routinely under 500ms.

SELECT
  ll_16k,
  count(ll_16k),
  sum(price)
FROM ppd_2015_perf
GROUP BY ll_16k;

I am just getting started with MongoDB. Am I doing something wrong here or is this the expected performance difference?

This is a test subset. The actual dataset will be 20,000,000 documents/records in size and expected to grow about 1,000,000 records per year.

Best Answer

One difference to consider is the amount of IO happening. For Postgres data is normalised into tables and tables are read one page at at time. To complete the SQL it need only read however much disk is required to store all of table ppd_2015_perf.

Mongo's unit of IO, however, is a whole document. All the denormalised/ duplicated stuff comes in along with the three fields in which you're interested.

Of course there a lot of things assumed here - RAM, compression, disk performance and data models being some.