PostgreSQL runs out of memory in the middle of query

memorypostgresql

I am running a PostgreSQL 11 database in AWS RDS in a db.t2.xlarge instance (4 CPU 16 Gb RAM) with 4 Tb of storage. My query is based on a fairly large table (48 Gb — 243.711.955 rows), but nothing that PostgreSQL should not be able to handle.

I'm running a simple query (I add the query plan in case that can give any insight), but after ~10 hours, the storage space drops dramatically to nil, and the query fails with:

ERROR:  could not extend file "base/16401/321099.3525": No space left on device
HINT:  Check free disk space.

I tried to solve the problem with additional storage (I added 1 Tb to the database and ran a VACCUM on all the tables — I know that autovaccum is set on in RDS instances), but the problem persisted. I am afraid that adding space might not be the possible solution since the database is taking less than 1 Tb:

         name         |    owner    |   size
----------------------+-------------+-----------
 rdsadmin             | rdsadmin    | No Access
 <my_database>        | <user_db>   | 317 GB

I do not completely understand the way PostgreSQL creates temporary files, are those hashes or temporary ORDER BY indexes (?), but I have a significant amount of those in my db:

       datname        | Temporary files | Size of temporary files
----------------------+-----------------+-------------------------
 rdsadmin             |               0 |                       0
 template0            |               0 |                       0
 postgres             |               0 |                       0
 <my_database>        |            1079 |            991097557473
 template1            |               0 |                       0

Should I keep adding storage space until the query run (RDS has a "emergency storage" for these cases)? or there is some kind of configuration that I am missing before running my query?

EDIT: Add query.

Query:

    with times as (
        select name_source,
               id_source,
               run,
               init_dt,
               hour_along,
               ntile(3) over (partition by run, date_part('year', init_dt), id_source order by hour_along) as t,
               height,
               traj_dt,
               geom
        from hysplit_process.clean_trajectories
    ),
         dates_for_t as (
             select distinct on (
                id_source,
                name_source,
                run,
                date_part('year', init_dt),
                t) name_source,
                   id_source,
                   run,
                   t,
                   traj_dt
             from times
             order by id_source, name_source, run, date_part('year', init_dt), run, t, traj_dt desc
         ),
         agg_trajs_by_t as (
             select name_source,
                    id_source,
                    run,
                    t,
                    init_dt,
                    avg(st_x(geom)) as avg_lon,
                    avg(st_y(geom)) as avg_lat,
                    avg(height)     as avg_height
             from times
             group by name_source, id_source, run, init_dt, t
         )
    select t.name_source,
           t.id_source,
           t.run,
           t.t,
           t.init_dt,
           d.traj_dt,
           st_setsrid(st_makepoint(t.avg_lon, t.avg_lat), 4326) as geom,
           t.avg_height
    from agg_trajs_by_t as t
             left join dates_for_t as d
                       using (run, t);

Query plan:

                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Right Join  (cost=274164346.44..274773926.31 rows=24371195 width=104)
   Merge Cond: ((d.run = t.run) AND (d.t = t.t))
   CTE times
     ->  WindowAgg  (cost=90827593.60..97529672.28 rows=243711952 width=100)
           ->  Sort  (cost=90827593.60..91436873.48 rows=243711952 width=96)
                 Sort Key: clean_trajectories.run, (date_part('year'::text, clean_trajectories.init_dt)), clean_trajectories.id_source, clean_trajectories.hour_along
                 ->  Seq Scan on clean_trajectories  (cost=0.00..6897785.40 rows=243711952 width=96)
   CTE dates_for_t
     ->  Unique  (cost=72144047.24..73971886.88 rows=40000 width=56)
           ->  Sort  (cost=72144047.24..72753327.12 rows=243711952 width=56)
                 Sort Key: times.run, times.t, times.traj_dt DESC
                 ->  CTE Scan on times  (cost=0.00..4874239.04 rows=243711952 width=56)
   CTE agg_trajs_by_t
     ->  GroupAggregate  (cost=88804047.24..95932621.83 rows=24371195 width=80)
           Group Key: times_1.name_source, times_1.id_source, times_1.run, times_1.init_dt, times_1.t
           ->  Sort  (cost=88804047.24..89413327.12 rows=243711952 width=96)
                 Sort Key: times_1.name_source, times_1.id_source, times_1.run, times_1.init_dt, times_1.t
                 ->  CTE Scan on times times_1  (cost=0.00..4874239.04 rows=243711952 width=96)
   ->  Sort  (cost=3857.54..3957.54 rows=40000 width=16)
         Sort Key: d.run, d.t
         ->  CTE Scan on dates_for_t d  (cost=0.00..800.00 rows=40000 width=16)
   ->  Materialize  (cost=6726307.90..6848163.88 rows=24371195 width=80)
         ->  Sort  (cost=6726307.90..6787235.89 rows=24371195 width=80)
               Sort Key: t.run, t.t
               ->  CTE Scan on agg_trajs_by_t t  (cost=0.00..487423.90 rows=24371195 width=80)

Best Answer

Thanks everyone for your suggestions.

Following the comments above, I dismantle the CTE into several smaller steps. That not only allowed me to use more indexes, but also to avoid crashing my RDS instance. I hope this helps people in the future in case they are stucked with PostgreSQL 11.