Postgresql – Gaining visibility into long running INSERT INTO … SELECT

postgresql

I have a large SQL insert statement that updates a summary table. The process normally runs hourly, and takes about 5 to 10 minutes to calculate uniques over that period. I "foolishly" chose to run it over a period of 15 days, and I'm wondering if there's anything I can do to gain visibility into where it's at. The process has been running for 9 hours already. I would simply like to know if it's still the SELECT query that's running or if it's inserting data: I need to know if I should stop it and do smaller batches, or if I should just let it finish.

I know the server's been at it for 9 hours because of that statement:

select
    now() - xact_start
  , procpid
  , client_addr
  , client_port
  , current_query
from pg_stat_activity
where xact_start is not null
order by 1 desc

The actual query is:

INSERT INTO summary_show_unique_personas(period, show_id, persona_id, interactions_count)
  SELECT
      date_trunc('hour', created_at) AS period
    , show_id
    , persona_id
    , COUNT(*)
  FROM
          twitter_interactions
    JOIN  show_bindings        USING(interaction_id)
    JOIN  twitter_personas     USING(screen_name)
  WHERE
        created_at >= '__PERIOD_START_AT__' AND created_at < '__PERIOD_END_AT__'
    AND interaction_created_at >= '__PERIOD_START_AT__' AND interaction_created_at < '__PERIOD_END_AT__'
  GROUP BY
    1, 2, 3;

Best Answer

Postgres doesn't allow dirty reads so we need some other means of checking if rows are being inserted from a second transaction, such as:

select pg_relation_size('summary_show_unique_personas');

The value returned by this query will be increasing if inserts are currently happening (unless, I think, they are filling up space freed up by previous delete statements).