The Setup
I am running PostgreSQL 9.4.15 on an SSD-based, quad-core Virtual Private Server (VPS) with Debian Linux (8). The relevant table has approximately 2-million records.
Records are frequently being inserted and even more frequently (constantly — every few seconds at least) updated. As far as I can tell, I have all appropriate indexes in place for these operations to execute snappily, however, and the vast majority of the time they do execute instantly (in milliseconds).
The Problem
Every hour or so, however, one of the UPDATE
queries takes an excessive amount of time — like 10 seconds or more. And when this happens, it's usually like a "batch" of queries that get "blocked", all terminating at roughly the same time. It's as if one of the queries, or some other background operation (e.g., a vacuum) is blocking them all.
Schema
The table, items
, has many columns, but I think the following are the only ones possibly relevant to the problem:
id INTEGER NOT NULL
(primary key)search_vector TSVECTOR
last_checkup_at TIMESTAMP WITHOUT TIME ZONE
And these are the relevant indexes:
items_pkey PRIMARY KEY, btree (id)
items_search_vector_idx gin (search_vector)
items_last_checkup_at_idx btree (last_checkup_at)
Likely Culprits
Finally, after rigging together a little script to dump the contents of pg_stat_activity
(the list of all active Postgres connections/queries) whenever a "connection leak" warning is emitted to my log-file, I've narrowed down the possible culprit queries/columns (assuming the problem isn't external, like with a misbehaving VPS). These are, roughly, the kinds of queries that seem to appear again and again:
UPDATE items SET last_checkup_at = $1 WHERE items.id = 123245
UPDATE items SET search_vector = [..] WHERE items.id = 78901
Those are slightly paraphrased, but I truly doubt anything relevant is missing. Occasionally other queries (on other tables) appear as well, but those usually look like they were just "unlucky" to get caught in the mix.
Now, even though the first query (setting last_checkup_at
) tends to appear most of the time, the query that sets search_vector
seems to appear every time. (And in addition, there are probably many more instances of the first query being issued in general, making it more likely to just be there on happenstance.)
(I think I'm winnowing in on a solution here, but even if I have it in the bag I wanted to document the incident here for others… I've been mystified by this problem for months, before getting a chance to deep-dive.)
Best Answer
The problem seems to have been Postgres's "
FASTUPDATE
" mechanism.FASTUPDATE
is a setting available onGIN
indexes which, when enabled, causes changes to the index (caused byUPDATE
s and presumablyINSERT
s as well) to be "queued up". Then, once this "queue" becomes too large, the pending entries are properly integrated into theGIN
index.The aim of
FASTUPDATE
is (no surprise) to speed up index updates, but it unfortunately leads to an occasionalUPDATE
query being exceptionally slow. In my case, I found it preferable to take the hit up-front (mainly to avoid warnings of a "slow query" in my logs).FASTUPDATE
is apparently enabled by default and available since PostgreSQL 8.4. I was able to disable it like this:At the time of writing, I have been running as such for almost a week with almost no slow queries. (Aside one query that I expect to take a long time, I've noticed little else.)
You may also find more pertinent information in a related thread from the Postgres mailing list. Interestingly, one of the Postgres devs (Tom Lane) suggests processing of the
FASTUPDATE
pending items "was not supposed to block concurrent insertions", but I'm not sure if that's correct; in my case I would see several queries get "backed up" and then finish all at once.