PostgreSQL – How to Prevent Deadlocks in Many-to-Many Insert/Update Trigger Function?

database-designdeadlockmany-to-manypostgresqlpostgresql-10

I'm having a problem with deadlocks on a many-to-many insert and am pretty far out of my league at this point.

I have a tweet table that receives thousands of records per second. One of the columns is a PostgreSQL array[]::text[] type with zero-to-many urls in the array. It looks like {www.blah.com, www.blah2.com}.

What I'm tryin to accomplish from a trigger on the tweet table is to create an entry in a urls_starting table and then adding the tweet/url_starting relationship in a tweet_x_url_starting.

Side note: The url_starting table is linked to a url_ending table where the fully resolved url paths reside.

The problem I face is deadlocks and I don't know what else to try.
I went on an Erwin Brandstetter learning spree. (if you're out there man… THANK YOU! ?)

  1. How to implement a many-to-many relationship in PostgreSQL?
  2. Deadlock with multi-row INSERTs despite ON CONFLICT DO NOTHING
  3. Postgres UPDATE … LIMIT 1 (skip locked help)

I tried adding ORDER BY's for deterministic, stable orders and FOR UPDATE SKIP LOCKED but am not sure I'm doing any of it correctly.

Here's the structure. Using PostgreSQL 10.5.

CREATE TABLE tweet(
    id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY,
    twitter_id text NOT NULL,
    created_at timestamp NOT NULL,
    content text NOT NULL,
    urls text[],
    CONSTRAINT tweet_pk PRIMARY KEY (id)
);

CREATE TABLE url_starting(
    id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY,
    url text NOT NULL,
    CONSTRAINT url_starting_pk PRIMARY KEY (id),
    CONSTRAINT url_starting_ak_1 UNIQUE (url)
);

CREATE TABLE tweet_x_url_starting(
    id_tweet integer NOT NULL,
    id_url_starting integer NOT NULL,
    CONSTRAINT tweet_x_url_starting_pk PRIMARY KEY (id_tweet,id_url_starting)

ALTER TABLE tweet_x_url_starting ADD CONSTRAINT tweet_fk FOREIGN KEY (id_tweet)
REFERENCES tweet (id) MATCH FULL
ON DELETE CASCADE ON UPDATE CASCADE;

ALTER TABLE tweet_x_url_starting ADD CONSTRAINT url_starting_fk FOREIGN KEY (id_url_starting)
REFERENCES url_starting (id) MATCH FULL
ON DELETE CASCADE ON UPDATE CASCADE;

Here's the tweet table trigger.

CREATE TRIGGER create_tweet_relationships
    AFTER INSERT OR UPDATE
    ON tweet
    FOR EACH ROW
    EXECUTE PROCEDURE create_tweet_relationships();

And finally, the function.

CREATE FUNCTION create_tweet_relationships ()
    RETURNS trigger
    LANGUAGE plpgsql
    VOLATILE 
    CALLED ON NULL INPUT
    SECURITY INVOKER
    COST 1
    AS $$
BEGIN
    IF (NEW.urls IS NOT NULL) AND cardinality(NEW.urls::TEXT[]) > 0 THEN
        WITH tmp_url AS (
          INSERT INTO url_starting (url)
          SELECT UNNEST(NEW.urls)
            ORDER BY 1
          ON CONFLICT (url) DO UPDATE
            SET url = EXCLUDED.url
          RETURNING id
        )
        INSERT INTO tweet_x_url_starting (id_tweet, id_url_starting)
        SELECT NEW.id, id
            FROM tmp_url
            ORDER BY 1, 2
            FOR UPDATE SKIP LOCKED
        ON CONFLICT DO NOTHING;
    END IF;

    RETURN NULL;
END
$$;

I blindly threw the stuff I read about into the function without success.

The error looks like this.

deadlock detected
DETAIL:  Process 11281 waits for ShareLock on transaction 1317; blocked by process 11278.
Process 11278 waits for ShareLock on transaction 1316; blocked by process 11281.
HINT:  See server log for query details.
CONTEXT:  while inserting index tuple (494,33) in relation "url_starting"
SQL statement "WITH tmp_url AS (
          INSERT INTO url_starting (url)
          SELECT UNNEST(NEW.urls)
      ORDER BY 1
          ON CONFLICT (url) DO UPDATE
              SET url = EXCLUDED.url
          RETURNING id
        )
        INSERT INTO tweet_x_url_starting (id_tweet, id_url_starting)
        SELECT NEW.id, id
            FROM tmp_url
    ORDER BY 1, 2
        FOR UPDATE SKIP LOCKED
        ON CONFLICT DO NOTHING"
PL/pgSQL function create_tweet_relationships() line 12 at SQL statement

Error causing transaction rollback (deadlocks, serialization failures, etc).

How can I stop the deadlocks? Thanks! ?

Best Answer

This might require careful examination, knowledge of the complete situation and more time than I can spend now. Or maybe I am missing something obvious. Whatever else might go wrong here, some things stand out:

  • Remove the FOR UPDATE SKIP LOCKED completely. It makes no sense where you use it. While selecting from the CTE, which holds rows with exclusive lock already, that makes no sense. It also would not make sense to skip any rows at this stage of the query.

  • COST 1 is misleading. The default is COST 100 and your trigger function is more in the realm of COST 5000. Leave the default or set it higher. Probably unrelated to the deadlock.

  • An AFTER trigger might be more susceptible to deadlocks than rewriting the whole workflow with a single query (with multiple data-modifying CTEs).

  • My shot in the dark: the deadlock is caused by the FK constraint trying to take a ShareLock on url_starting while concurent transactions try to modify the same row after having taken a similar ShareLock themselves and vice versa. A quick and dirty solution might be to drop the FK constraint tweet_x_url_starting if you can afford that. You could at least try that to verify it's part of the problem.

If you want to continue your learning spree - here is one more that seems relevant: