I have a large SQL insert statement that updates a summary table. The process normally runs hourly, and takes about 5 to 10 minutes to calculate uniques over that period. I "foolishly" chose to run it over a period of 15 days, and I'm wondering if there's anything I can do to gain visibility into where it's at. The process has been running for 9 hours already. I would simply like to know if it's still the SELECT query that's running or if it's inserting data: I need to know if I should stop it and do smaller batches, or if I should just let it finish.
I know the server's been at it for 9 hours because of that statement:
select
now() - xact_start
, procpid
, client_addr
, client_port
, current_query
from pg_stat_activity
where xact_start is not null
order by 1 desc
The actual query is:
INSERT INTO summary_show_unique_personas(period, show_id, persona_id, interactions_count)
SELECT
date_trunc('hour', created_at) AS period
, show_id
, persona_id
, COUNT(*)
FROM
twitter_interactions
JOIN show_bindings USING(interaction_id)
JOIN twitter_personas USING(screen_name)
WHERE
created_at >= '__PERIOD_START_AT__' AND created_at < '__PERIOD_END_AT__'
AND interaction_created_at >= '__PERIOD_START_AT__' AND interaction_created_at < '__PERIOD_END_AT__'
GROUP BY
1, 2, 3;
Best Answer
Postgres doesn't allow dirty reads so we need some other means of checking if rows are being inserted from a second transaction, such as:
The value returned by this query will be increasing if inserts are currently happening (unless, I think, they are filling up space freed up by previous
delete
statements).