I have a query that fetches a number of aggregations over a table. It's very similar to the following adapted example from the Postgres documentation, just with more entries in the grouping sets clause:
SELECT
brand,
size,
count(*)
FROM items_sold
GROUP BY GROUPING SETS (
brand,
size,
()
);
If there is a WHERE clause that restricts this query to a small subset of the table, it is pretty obvious how to speed that up with an index. But what options do I have for cases where I run this kind of query against the whole table, or a large enough fraction that an index on the WHERE clause condition isn't useful?
Now, I personally can't really think of a way of speeding this kind of query up, in the end it has to scan the entire table to perform these aggregations. But just because I can't think of any way doesn't mean there isn't one. From what I read, dedicated search engines like Elasticsearch or Solr can perform similar aggregations over very large amounts of data, and I assume they're reasonably fast at this.
What options do I have to speed up this kind of query that uses a GROUP BY GROUPING SET over an entire table?
Best Answer
That's basically right, and isn't unique to GROUPING SETS. If you are scanning a lot of blocks then IO will likely dominate execution time so you have only two strategies:
Regarding faster IO, you need to look at hardware or configuration changes. Parallel query was mentioned in the comments, but this could be faster or slower and isn't a magic bullet for IO.
Regarding reducing IO, note that your query does not necessarily need to look at the table at all — it could use an appropriate covering index instead:
So an index including
brand
andsize
(in either order) is all that is needed.test data:
without covering index:
with covering index:
dbfiddle here
Note the "Buffers" data — the covering index does much less IO here.