Postgresql – Improve UPDATE performance on big table

amazon-rdsindex-tuningperformancepostgresqlpostgresql-performanceupdate

I'm using Postgres 9.5 on Amazon RDS (2vCPU, 8 GB RAM).
I use pganalyze to monitor my performance.
I have around 200K records in the database.

In my Dashboard I see the following queries are taking 28 and 11 seconds in average to execute:

UPDATE calls SET ... WHERE calls.uuid = ?   telephonist 28035.41    0.01    100%    0.03%

UPDATE calls SET sip_error = ? WHERE calls.uuid = ? telephonist 11629.89    0.44    100%    0.69%

I already tried VACUUM, found and clean 7,670 dead rows.
Any ideas how to improve UPDATE performance? This is the query:

UPDATE calls SET X=Y WHERE calls.uuid = 'Z'

How can I improve the query above? Can I add another field? Example:

UPDATE calls SET X=Y WHERE calls.uuid = 'Z' AND calls.campaign = 'W'

The column uuid is not indexed.
https://www.tutorialspoint.com/postgresql/postgresql_indexes.htm suggests that indexes are not recommended for UPDATE operations.

CREATE TABLE public.calls (
    id int4 NOT NULL DEFAULT nextval('calls_id_seq'::regclass),
    callsid varchar(128),
    call_start timestamp(6) NOT NULL,
    call_end timestamp(6) NULL,
    result int4 DEFAULT 0,
    destination varchar(256),
    campaign varchar(128),
    request_data varchar(4096),
    uuid varchar(128) NOT NULL,
    status varchar(64),
    duration int4,
    recording_file varchar(256),
    recording_url varchar(256),
    recording_duration int4,
    recording_text varchar(4096),
    recording_download bool DEFAULT false,
    description varchar(4096),
    analysis varchar(4096),
    is_fax bool DEFAULT false,
    is_test bool,
    hangup_cause varchar(128),
    media_detected bool DEFAULT false,
    sip_callid varchar(256),
    hangup_cause_override varchar(256),
    is_blacklisted bool DEFAULT false,
    sip_error varchar(256),
    hangup_cause_report varchar(128),
    summary varchar(1024)
);

EXPLAIN ANALYZE 
    SELECT * FROM calls 
    WHERE calls.uuid='e2ce9eb4-v1lp-p14u-7kkk-lruy-e2ceaae46d';

Seq Scan on calls  (cost=0.00..16716.25 rows=1 width=3301) 
                   (actual time=81.637..81.637 rows=0 loops=1)
  Filter: ((uuid)::text = 'e2ce9eb4-v1lp-p14u-7kkk-lruy-e2ceaae46d'::text)
  Rows Removed by Filter: 99970
Planning time: 0.482 ms
Execution time: 81.683 ms

Best Answer

Assuming the column uuid is supposed to be UNIQUE, this table definition should save some space and help performance:

CREATE TABLE public.calls (
   id serial PRIMARY KEY,
   result int4 DEFAULT 0 NOT NULL,
   uuid uuid UNIQUE NOT NULL  -- creates the index you need automatically
   call_start timestamp NOT NULL,
   call_end timestamp,  -- so this can be NULL?
   duration int4,
   recording_duration int4,
   callsid varchar(128),
   destination varchar(256),
   campaign varchar(128),
   request_data varchar(4096),
   status varchar(64),
   recording_file varchar(256),
   recording_url varchar(256),
   recording_text varchar(4096),
   recording_download bool DEFAULT false,
   description varchar(4096),
   analysis varchar(4096),
   is_fax bool DEFAULT false,
   is_test bool,
   hangup_cause varchar(128),
   media_detected bool DEFAULT false,
   sip_callid varchar(256),
   hangup_cause_override varchar(256),
   is_blacklisted bool DEFAULT false,
   sip_error varchar(256),
   hangup_cause_report varchar(128),
   summary varchar(1024)
);

The most important feature here is the UNIQUE constraint, which is implemented with a unique index, and an index is what you need more than anything else (like @ypercube already commented).

If uuid is not unique, create a plain btree index on it.

If uuid is not a valid uuid, leave it as character type (varchar or text), but still create that index.

Size and performance considerations for data type varchar vs. uuid:

Would index lookup be noticeably faster with char vs varchar when all values are 36 chars

All my other suggested changes are minor improvements. Detailed explanation here:

Configuring PostgreSQL for read performance

If you don't need to enforce a particular maximum length I would just use text for all your character columns. But that has hardly any immediate effect on performance. Some of the columns might be converted to a more fitting type (with an actual performance benefit).

Indexes and `UPDATE`

So why does that tutorial page say:

When should indexes be avoided?
[...]
- Tables that have frequent, large batch update or insert operations.

That's misleading by omission. You desperately need the one index on uuid to support the predicate for your update. All other indexes slow your update down, because they require additional work to keep them current after an update. So if you have no use for the PRIMARY KEY on id (to allow FK constraints to it for instance), you can drop that (and make uuid the PK instead - also indexed automatically).

Best Answer

Indexes and UPDATE

Related Solutions

Mysql – Slow INSERT/UPDATE on InnoDB

Postgresql – Improve performance on concurrent UPDATEs for a timestamp column in Postgres

Related Question

Indexes and `UPDATE`