Postgresql – Handling INSERT race condition for generated unique column value

postgresql

I am generating a unique slug during creation of a record like so (based in this answer)

INSERT INTO ks.shares (
    , slugbase
    , slugindex
    , slug
    , userid
)
SELECT
    , Pslugbase
    , COALESCE((SELECT(max(slugindex) + 1)
            FROM  ks.shares s
            WHERE s.slugbase = Pslugbase), 0)
    , Pslugbase
        || COALESCE((SELECT '-'::text || (max(slugindex) + 1)::text
            FROM  ks.shares s
            WHERE s.slugbase = Pslugbase), '')
    , Puserid
RETURNING id, modified, slug INTO Vnewshare;

When testing under load this can raise a unique_violation as two concurrent threads both assess what it he current slug index before one inserts a record with the next available slug, leaving the second thread with a now invalid slug.

I attempted to solve the issue by LOOPING over the insert and catching and unique constraint violations:

LOOP
BEGIN
    INSERT INTO ks.shares (
        , slugbase
        , slugindex
        , slug
        , userid
    )
    SELECT
        , Pslugbase
        , COALESCE((SELECT(max(slugindex) + 1)
                FROM  ks.shares s
                WHERE s.slugbase = Pslugbase), 0)
        , Pslugbase
            || COALESCE((SELECT '-'::text || (max(slugindex) + 1)::text
                FROM  ks.shares s
                WHERE s.slugbase = Pslugbase), '')
        , Puserid
    RETURNING id, modified, slug INTO Vnewshare;

    -- INSERT into two other tables

    RETURN json_build_object (
        'data',json_build_object (
            'id',lpad(Vnewshare.id::text,public.padding_constant(),'0'),
            'modified',Vnewshare.modified,
            'slug',Vnewshare.slug
        )
    );
    EXCEPTION WHEN unique_violation THEN
        -- do nothing, and loop to try the INSERT again
END;
END LOOP;

(I took this pattern from an answer on implementing UPSERT but can't find it!)

However, when I deployed this function my DB instance got stuck twice running long queries that involved this function being called concurrently, so I seemed to have somehow replaced a race condition with a deadlock or an infinite loop

pid  |    duration     |                                      query                                      | state
------+-----------------+---------------------------------------------------------------------------------+--------
8187 | 01:47:35.477316 | select ks.post_share($1::text,$2::text,$3::bigint) | active
1188 | 01:57:56.955747 | select ks.post_share($1::text,$2::text,$3::bigint) | active
(2 rows)
  • Is the approach to use a LOOP here the wrong way to go?
  • Is there any way to understand what is causing the query to get stuck? It does not happen during load testing but appeared twice on production (which seems counter intuitive)
  • Should I just return the conflict to the calling process and let the application handle retry?

Any thoughts would be greatly appreciated.

Best Answer

You don't mention the isolation level, but if it's run under the REPEATABLE READ isolation level, the SELECT part will never see the new values that other transactions inserted concurrently, so this code may be stuck in its loop forever.

Should I just return the conflict to the calling process and let the application handle retry?

Yes, it would make sense. The key is to retry in a different transaction, so it gets a fresh new snapshot, plus it releases whatever locks that may be blocking other transactions.

Another approach would be to not use max(slugindex)+1, but a sequence instead, if the number part of the slug is globally unique, not unique per suffix. The answer you linked is based on the premise that they want abc-1 and xyz-1 as opposed to abc-1 and xyz-2, which is why a sequence is not an option. But it's not obvious that you share that requirement, looking at your SELECT.