This is a simplified version of my prior question. I removed the many-to-many complexity and still deadlocks. It happens WAY less frequently, but still happens. ?
The situation…
I have a tweet
table and one of the columns receives an array[]::text[]
of urls.
There is a trigger function on the table that inserts the urls into a url_starting
table.
The url_starting table looks like this.
CREATE TABLE public.url_starting(
id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY,
url text NOT NULL,
CONSTRAINT url_starting_pk PRIMARY KEY (id),
CONSTRAINT url_starting_ak_1 UNIQUE (url)
);
And tweet table the trigger looks like this.
CREATE OR REPLACE FUNCTION public.create_tweet_relationships()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
BEGIN
INSERT INTO url_starting (url)
SELECT DISTINCT UNNEST(NEW.urls)
ORDER BY 1
ON CONFLICT DO NOTHING;
RETURN NULL;
END
$function$;
Sometimes I get deadlock errors like this.
deadlock detected
DETAIL: Process 4540 waits for ShareLock on transaction 4709; blocked by process 4531.
Process 4531 waits for ShareLock on transaction 4710; blocked by process 4540.
HINT: See server log for query details.
CONTEXT: while inserting index tuple (2314,101) in relation "url_starting"
SQL statement "INSERT INTO url_starting (url)
SELECT DISTINCT UNNEST(NEW.urls)
ORDER BY 1
ON CONFLICT DO NOTHING"
PL/pgSQL function create_tweet_relationships() line 12 at SQL statement
Error causing transaction rollback (deadlocks, serialization failures, etc).
Shots in the dark… ?♂️
Could this be caused by UNNEST? Am I doing something wrong in the syntax?
Why does the error say in relation "url_starting" when there is no relationship in the table?
There are thousands of tweets committing to the database simultaneously. It shouldn't matter if set up correctly, should it?
Best Answer
Under heavy concurrent write access, the defense against deadlocks is to process rows in consistent order in all writing queries.
You may have sorted rows properly in all your writing queries to the table
tweet
- did you? (I know about that table from your related question.)And you are obviously sorting rows to be "upserted" in
url_starting
in the trigger function.That should do it for transactions inserting a single row in
tweet
with multiple URLs to be upserted inurl_starting
.But while inserting multiple rows to
tweet
, each with an arbitrary array of URLs, rows to be upserted inurl_starting
are still in inconsistent order for the transaction. URLs are only sorted per row intweet
, not for the whole transaction (or even command). You would need to unnest all URLs of the sameINSERT
, sort them consistently, and then upserturl_starting
. This cannot be achieved with a triggerFOR EACH ROW
. I don't think you can solve this at all with your current approach while inserting multiple rows with arbitrary arrays of URLs. Concurrent write access to both tables is inherently conflicting.Single-row inserts to
tweet
should be fine (each in its own transaction) - though possibly substantially more expensive. Maybe you can get rid of the trigger completely and reorganize the workflow (with data-modifying CTEs): write a sorted list of URLs tourl_starting
, then write totweet
...Arrays are often problematic in a relational table design. Full normalization might be another approach - replacing the array column with a related table. Not sure if that's the way ...
Or you keep your transactions as short and fast as possible, making deadlocks the rare exception, and prepare your application to retry in case of an error.