Postgresql – Postgres: Insert distinct values from one table into another table with constrains in place

constraintdistinctpostgresql

I'm trying to insert data from one PostgreSQL 9.3 table to another while removing duplicate entries.

I have two tables containing email addresses.

The main table contains emails and a tag for each email address. The combination (email, tag) must be unique, and for this there is a constraint Unique(email, tag) on it.

The second table get created on the fly from text files which contain only the email address. There are many duplicates there.

I need to import the data from the temp table into the main one without breaking the constraint above. For a certain file containing email addresses the tag is constant.

Table structures:

CREATE TABLE emails (   
  email character varying(128),
  tag bigint,
  CONSTRAINT "unique-tag-email" UNIQUE (email, tag) )

and

CREATE TABLE emails_temp (email character varying(128)

Here is my query:

insert into emails(tag,email) 
select 
 655,t.email 
from 
 emails_temp as t 
where 
 not exists ( select email from emails where email = t.email )

Note: 655 is just a tag of a certain group of email addresses.

This is the error I get:

ERROR: duplicate key value violates unique constraint
"unique-tag-email" SQL state: 23505 Detail: Key (email,
tag)=(user@hotmail.com, 655) already exists.

There are indeed, two email addresses user@hotmail.com in the file.

Useless to say, because of this error, there is nothing added in the main table (email).

What am I doing wrong?

Best Answer

There are 3 possible kinds of duplicates:

  1. Duplicates within the rows of the bulk insert.

  2. Duplicates between inserted rows and existing rows.

  3. Duplicates between inserted rows and concurrently inserted / updated rows from other transactions.

Just like I explained in this closely related answer:

But things have become easier for 2. and 3. since Postgres 9.5 introduced UPSERT (INSERT .. ON CONFLICT DO NOTHING).

INSERT INTO emails(tag,email)
SELECT DISTINCT 655, email
FROM   emails_temp
ON CONFLICT (email) DO NOTHING;

If your duplicates only stem from duplicate entries in the source (1.), like you indicated, then all you need is DISTINCT. Works in any version of Postgres:

INSERT INTO emails(tag,email)
SELECT DISTINCT 655, email
FROM   emails_temp;