Postgresql – How to avoid duplications using a function

concurrencyduplicationplpgsqlpostgresqlpostgresql-9.3

I am using this function in a bulk insert to avoid duplication of url paths.

CREATE OR REPLACE FUNCTION "univ"."gc_landing"(IN _name text, OUT landing_id int4)
  RETURNS "int4" 
AS $BODY$
DECLARE 
  landing_name TEXT;
BEGIN
landing_name := _name;
LOOP
BEGIN
WITH sel AS (
  SELECT id
  FROM   univ.landings
  WHERE  name = landing_name
  )
, ins AS (
  INSERT INTO univ.landings (name)
  SELECT landing_name
  WHERE  NOT EXISTS (SELECT 1 FROM sel)
  RETURNING id
  )
SELECT id
FROM   sel NATURAL FULL OUTER JOIN ins 
INTO   landing_id;

EXCEPTION WHEN UNIQUE_VIOLATION THEN
    RAISE NOTICE 'It actually happened!';
END;

EXIT WHEN landing_id IS NOT NULL;
END LOOP;
END$BODY$
LANGUAGE plpgsql;

A path looks like this:

%2faccessories%2fliners.html%26sa%3du%26ei%3df1n2voo3fjk1acowgnao%26ved%3d0cesq9qewbg%26usg%3dafqjcnhvnzccoijbs0zvswwjxexhyd-7xw

My table:

 id        | integer                     | not null default nextval('univ.seq_landings_id'::regclass) | plain
 name      | text                        | not null                                                   | extended
 created_at| timestamp without time zone | not null default now()                                     | plain
Indexes:
"landings_pkey" PRIMARY KEY, btree (id)
"uniq_index_landings_on_name" UNIQUE, btree (name)

I have an unique index on name, but the function is still duplicating. How can I make sure that there will be no duplications?

Best Answer

I have re-tested your function and it seems to work as advertised.

If, in fact a duplicate value was entered, you would get an exception that would roll back the transaction.
The function is designed to trap such an exception (UNIQUE_VIOLATION) and to try again, which would find the (now) existing row and return the existing id.

Your report of "duplications" doesn't seem possible. There must be a misunderstanding somewhere.

I only have minor improvements to offer:

Make that a UNIQUE CONSTRAINT, that's cleaner than a unique index, while doing the same for the case at hand. Since you already have a unique index, you can use the special form:

ALTER TABLE univ.landings ADD CONSTRAINT landings_name_key UNIQUE
USING INDEX index_name univ.uniq_index_landings_on_name;

You can simplify the function a bit. Your additional variable landing_name is just a waste:

CREATE OR REPLACE FUNCTION univ.gc_landing(_name text, OUT landing_id int)
  RETURNS int AS
$func$
BEGIN
LOOP
   BEGIN
   WITH sel AS (
     SELECT id
     FROM   univ.landings
     WHERE  name = _name
     )
   , ins AS (
     INSERT INTO univ.landings (name)
     SELECT _name
     WHERE  NOT EXISTS (SELECT 1 FROM sel)
     RETURNING id
     )
   SELECT id
   FROM   sel NATURAL FULL OUTER JOIN ins 
   INTO   landing_id;

   EXCEPTION WHEN UNIQUE_VIOLATION THEN
       RAISE NOTICE 'It actually happened!';
   END;

   EXIT WHEN landing_id IS NOT NULL;
END LOOP;
END
$func$  LANGUAGE plpgsql;

But note that this function is designed for individual inserts. For bulk operations like you mention, a different approach might be much more efficient: fold duplicates in the source before you bulk-insert into the target table.