Postgresql – Query works locally but not on AWS RDS

amazon-rdsawspostgresql

We have a large query that works fine locally but deadlocks on AWS RDS. According to CloudWatch, we are not resource-bound on CPU, disk or IOPS, and nothing else should be accessing these tables.

The gist of the query is a three step process.

Pivoting

The upstream data is millions of records of (site, key, value) with a dozen or so standard keys. The pivot (which works on AWS) creates a new table with (site, value1, value2, value3,...) records. There are over 800k rows in the pivot table.

A modest amount of processing is done at this stage, mostly COALESCE() to allow some key-value pairs to override other key-value pairs.

Indexes and Analyzing

We add the indexes required to optimize the following query and VACUUM ANALYZE the newly created tables.

Denormalization

The pivot table values are foreign keys into various other tables. We denormalize the references for subsequent processing with a Java application. For performance reasons we create a table although a materialized view would probably also work. The table creation is a straightforward CREATE TABLE table_name AS SELECT * FROM view_name.

The foreign tables have a modest amount of data, typically on the order of dozens to hundreds of entries although one may hit 10,000+ records. All have the necessary indexes to optimize the query (per EXPLAIN).

A modest amount of processing is done at this stage, mostly CASE statements which tweak data irregularities.

The denormalization takes a long time (~hour) but always succeeds on my desktop. It has consistently failed on AWS, even at 24+ hours. As I mentioned above it's not resource-limited and we have verified that the hangup is locks on the pivot table.

Does anyone have any idea what could be the cause or a fix? It's a straightforward view with a fair number of scalar subqueries on foreign keys but no other joins. The table is large, over 800k records, but it's one record in -> one record out.

Our next step is to check the behavior with a smaller dataset but success would only tell us that we could try breaking apart our query into smaller pieces, not why we're seeing the deadlocks and what we can do to prevent them.

Best Answer

Our final solution was to split the query so we did pivoting and denormalization in separate steps. This is probably a cleaner solution anyway since it follows the "just do one thing" principle so will probably be easier to maintain.

Thanks for all of the help!