I have a stored procedure that does the following :
INSERT INTO schema.my_unique_values
SELECT DISTINCT id, value
FROM schema.a_huge_table
WHERE NOT EXISTS (SELECT 1
FROM schema.my_unique_values)
In summary, this query will insert a unique rows into the my_unique_values table from the a_huge_table.
The problem I have is that when I run this query concurrently in a multi-threaded script for a reason, it can insert duplicate rows. I have tried using serializable to avoid this 'phantom reads', but still no luck. My idea is to allow Postgres to only insert non-duplicate rows on unique constraint violation. But is this possible ?
My current experience is that when a unique constraint violation is hit, it will cancel the whole transaction, so it will not insert the non-duplicate rows. How can I achieve my goal ?
Note: I am using Greenplum 4.3.11 that is using Postgres 8.2, therefore there are limitation on the query that I can use.
Thanks..
Best Answer
First of all, I think you need to change slightly your query, because the WHERE NOT EXISTS, as it is now written, would be false as soon as your table has some row. You need to specify a WHERE clause:
INSERT
, either within a transaction, or as a single-statement-transaction, will be atomic. So, if one row fails, everything will fail. In any case, you cannot have phantom reads inSERIALIZABLE
transaction isolation levels. If you're under that isolation level, make sure yourNOT EXISTS (...)
really has the properly writtenWHERE
clause.As of PostgreSQL version 9.5, there is a new clause that can be part of an
INSERT
and that looks exactly for your use-case. You could modify your querty to make use ofON CONFLICT DO NOTHING
:NOTE: The
ON CONFLICT
clause applies to each row, not to theINSERT
as a whole.