PostgreSQL – Why is This Simple Insert Trigger Function Causing Deadlocks?

deadlockpostgresqlpostgresql-10

This is a simplified version of my prior question. I removed the many-to-many complexity and still deadlocks. It happens WAY less frequently, but still happens. ?

The situation…
I have a tweet table and one of the columns receives an array[]::text[] of urls.
There is a trigger function on the table that inserts the urls into a url_starting table.

The url_starting table looks like this.

CREATE TABLE public.url_starting(
    id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY,
    url text NOT NULL,
    CONSTRAINT url_starting_pk PRIMARY KEY (id),
    CONSTRAINT url_starting_ak_1 UNIQUE (url)
);

And tweet table the trigger looks like this.

CREATE OR REPLACE FUNCTION public.create_tweet_relationships()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
BEGIN
    INSERT INTO url_starting (url) 
    SELECT DISTINCT UNNEST(NEW.urls)
    ORDER BY 1
    ON CONFLICT DO NOTHING;

    RETURN NULL;
END
$function$;

Sometimes I get deadlock errors like this.

deadlock detected
DETAIL:  Process 4540 waits for ShareLock on transaction 4709; blocked by process 4531.
Process 4531 waits for ShareLock on transaction 4710; blocked by process 4540.
HINT:  See server log for query details.
CONTEXT:  while inserting index tuple (2314,101) in relation "url_starting"
SQL statement "INSERT INTO url_starting (url) 
    SELECT DISTINCT UNNEST(NEW.urls)
    ORDER BY 1
    ON CONFLICT DO NOTHING"
PL/pgSQL function create_tweet_relationships() line 12 at SQL statement

Error causing transaction rollback (deadlocks, serialization failures, etc).

Shots in the dark… ?‍♂️

Could this be caused by UNNEST? Am I doing something wrong in the syntax?

Why does the error say in relation "url_starting" when there is no relationship in the table?

There are thousands of tweets committing to the database simultaneously. It shouldn't matter if set up correctly, should it?

Best Answer

Under heavy concurrent write access, the defense against deadlocks is to process rows in consistent order in all writing queries.

You may have sorted rows properly in all your writing queries to the table tweet- did you? (I know about that table from your related question.)

And you are obviously sorting rows to be "upserted" in url_starting in the trigger function.

That should do it for transactions inserting a single row in tweet with multiple URLs to be upserted in url_starting.

But while inserting multiple rows to tweet, each with an arbitrary array of URLs, rows to be upserted in url_starting are still in inconsistent order for the transaction. URLs are only sorted per row in tweet, not for the whole transaction (or even command). You would need to unnest all URLs of the same INSERT, sort them consistently, and then upsert url_starting. This cannot be achieved with a trigger FOR EACH ROW. I don't think you can solve this at all with your current approach while inserting multiple rows with arbitrary arrays of URLs. Concurrent write access to both tables is inherently conflicting.

Single-row inserts to tweet should be fine (each in its own transaction) - though possibly substantially more expensive. Maybe you can get rid of the trigger completely and reorganize the workflow (with data-modifying CTEs): write a sorted list of URLs to url_starting, then write to tweet ...

Arrays are often problematic in a relational table design. Full normalization might be another approach - replacing the array column with a related table. Not sure if that's the way ...

Or you keep your transactions as short and fast as possible, making deadlocks the rare exception, and prepare your application to retry in case of an error.