Postgresql – COPY Command spawning multiple processes

copypostgresqlredhat

I have a lengthy data pipeline that imports a CSV, creates a sequence of dependent views and then exports the results using COPY as a CSV. This process had been working fine, completing in around 30-60 seconds. Suddenly, this process has started running much slower when it reaches the copy command. Running SELECT usename, state, query FROM pg_stat_activity; shows three identical, active COPY commands under my username. The process does eventually complete but now takes up to 20 minutes or more. Other than changing some initial sub-setting, the data has remained unchanged. No one else is using this database but there are other users on the cluster.

Has anyone encountered this behavior before? Does anyone know what might cause a sudden slow down in a COPY operation?

Postgres 11.6, RHEL 7

Best Answer

You aren't giving us much to go on here. Like, what is the query? What do the other columns of pg_stat_activity show? What does OS command "top" show?

I was thinking parallel workers, but parallel worker processes won't show up with their own row in pg_stat_activity until v13. So I think the most likely thing to have changed is your workflow is now launching multiple simultaneous COPY commands for unknown reason. Could you crank up logging in the database (log_statement='all', log_connections='on')?