We do a dump of huge database (~ 2 TB) via pg_dump command:
pg_dump -h 192.168.0.132 -d vodka_factory -U vodka_user -F d -j 10 -v -f /backups/vodka_factory_2021-03-26-23-56.dump/
pg_dump process successfully (return code 0, no errors), but on database we can see ~8 COPY processes and one "select" in 'idle in transaction' which never finish:
No locks. More detailed information for some processes:
postgres=# select * from pg_stat_activity where pid=358683;
-[ RECORD 1 ]----+---------------------------------------------------------------
datid | 16404
pid | 358683
application_name | pg_dump
client_addr | 10.136.131.164
client_hostname |
client_port | 57732
backend_start | 2021-03-26 23:56:43.311535+03
xact_start | 2021-03-26 23:56:43.321329+03
query_start | 2021-03-26 23:57:07.682549+03
state_change | 2021-03-26 23:57:07.683027+03
wait_event_type | Client
wait_event | ClientRead
state | idle in transaction
backend_xid |
backend_xmin | 3802998627
query | SELECT last_value, is_called FROM public.waterfall_task_id_seq
backend_type | client backend
postgres=# select * from pg_stat_activity where pid=358919;
-[ RECORD 1
datid | 16404
pid | 358919
application_name | pg_dump
client_addr | 10.136.131.164
client_hostname |
client_port | 57744
backend_start | 2021-03-26 23:57:07.836322+03
xact_start | 2021-03-26 23:57:07.844714+03
query_start | 2021-03-27 00:29:19.240355+03
state_change | 2021-03-27 00:30:10.458445+03
wait_event_type | Client
wait_event | ClientRead
state | idle in transaction
backend_xid |
backend_xmin | 3802998627
query | COPY public.slr_child_care (id, benefit_type_id, child_count, child_index, start_month, region_index, ent_id, created, modified, external_id, begin, "end", candidate_id, hospital_json, calcs_json, years_replace, first_year_replace, second_year_replace, excluded_days, old_end, compensation, alg_code, state, correction_id, ref_number, date_of_issue, regional_factor, share_by) TO stdout;
backend_type | client backend
Databse version: PostgreSQL 11.6 on x86_64-pc-linux-gnu
pg_dump version: pg_dump (PostgreSQL) 11.8
Could you please advice how we can investigate the issue? The iisue happens every time…
Best Answer
It is slightly surprising that
pg_dump
returned sucessfully with a return code of 0. The only theory I can come up with is that some of the 10 jobs got done early and started to idle, whereupon a misconfigured firewall cut the connection. The backends should die after around 2 hours due to TCP keepalive.What you could do is reduce the keepalive idle time and see if that improves things: