Greenplum : Inserting non-duplicate rows on unique constraint violation

greenplum

I have a stored procedure that does the following :

INSERT INTO schema.my_unique_values
     SELECT DISTINCT id, value
       FROM schema.a_huge_table
      WHERE NOT EXISTS (SELECT 1
                          FROM schema.my_unique_values)

In summary, this query will insert a unique rows into the my_unique_values table from the a_huge_table.

The problem I have is that when I run this query concurrently in a multi-threaded script for a reason, it can insert duplicate rows. I have tried using serializable to avoid this 'phantom reads', but still no luck. My idea is to allow Postgres to only insert non-duplicate rows on unique constraint violation. But is this possible ?
My current experience is that when a unique constraint violation is hit, it will cancel the whole transaction, so it will not insert the non-duplicate rows. How can I achieve my goal ?

Note: I am using Greenplum 4.3.11 that is using Postgres 8.2, therefore there are limitation on the query that I can use.

Thanks..

Best Answer

First of all, I think you need to change slightly your query, because the WHERE NOT EXISTS, as it is now written, would be false as soon as your table has some row. You need to specify a WHERE clause:

INSERT INTO schema.my_unique_values
     SELECT DISTINCT id, value
       FROM schema.a_huge_table a
      WHERE NOT EXISTS 
             (SELECT 1
                FROM schema.my_unique_values m
               WHERE m.id = a.id AND m.value = a.value)

INSERT, either within a transaction, or as a single-statement-transaction, will be atomic. So, if one row fails, everything will fail. In any case, you cannot have phantom reads in SERIALIZABLE transaction isolation levels. If you're under that isolation level, make sure your NOT EXISTS (...) really has the properly written WHERE clause.

As of PostgreSQL version 9.5, there is a new clause that can be part of an INSERT and that looks exactly for your use-case. You could modify your querty to make use of ON CONFLICT DO NOTHING:

INSERT INTO schema.my_unique_values
     SELECT DISTINCT id, value
       FROM schema.a_huge_table a
ON CONFLICT DO NOTHING ;

NOTE: The ON CONFLICT clause applies to each row, not to the INSERT as a whole.