Postgresql – Predict how much hard drive space a query might cost

optimizationperformancepostgresqlquery-performance

Once in a while I run into "No space left on the machine" after running a lengthy query. I also like to have a reasonable estimation, so that the IT team can purchase a correct cloud server. My question is: Given a query, is there a way to predict how much hard drive space it might cost? Is adding the size of involved tables a good enough estimation?

EDIT: Currently I am using Postgres 10. The query in question is about creating a materialized view and do some arithmetic, lag and group by on a single table with a size of around 750 GB.

Query looks like

CREATE MATERIALIZED VIEW foo AS
   WITH change_data AS (
             SELECT (
                      CAST(lagged_data.todays_count AS double precision) /                                      CAST(lagged_data.previous_count AS double precision)
                      - CAST(1 AS double precision)
                    ) / GREATEST(EXTRACT(
                        days from (date_belongs_to - previous_date_belongs_to)
                    ), 1) AS change,
                    lagged_data.date_belongs_to,
                    lagged_data.user_id
               FROM lagged_data
              WHERE lagged_data.previous_count IS NOT NULL
        ),
        lagged_change_data AS (
             SELECT change_data.change,
                    LAG(change_data.change) OVER (
                        PARTITION BY change_data.user_id
                            ORDER BY change_data.date_belongs_to
                    ) AS previous_change,
                    change_data.user_id
               FROM change_data
        ),
        acceleration AS (
              SELECT AVG(ABS(lagged_change_data.change - lagged_change_data.previous_change)) AS fgs,
                     lagged_change_data.user_id
                FROM lagged_change_data
               WHERE lagged_change_data.previous_change IS NOT NULL
            GROUP BY lagged_change_data.user_id
        )

Best Answer

It is hard to make a prediction, because that depends on many factors like the chosen execution plan.

You should first find out where the files are that keep growing.

  • If they are in the base/12345 subdirectory of the PostgreSQL data directory (the number will be different), it is the materialized view that is being built. That would mean that your disk space is not sufficient to contain the resulting object.

  • If the growing files are in base/pgsql_tmp, they are temporary files that the query executor creates to cache a large intermediate result.

    That might be an indication that you forgot a join condition somewhere - hard to tell, as we only see part of the query.

    Since CTEs are materialized before v12, it could also be the intermediate result of a WITH expression. You can try them out one after the other and see if they make things go boom.

Sorry, but it is not possible to give you more than these guidelines.