PostgreSQL – Pros and Cons of Checking Unique Column Value Before Insert

error handlingperformancepostgresql

While writing a query other day a thought came to me and have stuck in my mind.

What is preferable, first checking if a value for a unique column exists and then inserting or insert and let db raise unique constraint error? Will it even matter?

Edit: As suggested below in answer that this issue depends on database I am adding tag postgresql.

Best Answer

I do not think your question is really database agnostic. The right answer could depend on implementation details, which may vary from vendor to vendor and change with the next version. I would test under concurrency before choosing any approach on any RDBMS.

Right now, on SQL Server 2008 R2, I am using the following:

Low concurrency and low amount of modifications. To save a single row I serialize using sp_getapplock and use MERGE. I stress test under high concurrency to verify that it works.
Higher concurrency and/or volume. To avoid concurrency and boost performance, I do not save one row at a time. I accumulate changes on my app server, and use TVPs to save batches. Still, to avoid concurrency related issues, I serialize using sp_getapplock before the MERGE. Again, I stress test under high concurrency to verify that it works.

As a result, we have good performance and zero concurrency related issues in production: no deadlocks, no PK/unique constraints violations etc.

Related Solutions

Postgresql – Most efficient to prevent duplicates

PostgreSQL doesn't have INSERT ... IGNORE so that isn't an option. You can use upsert-like operations (search for "postgresql upsert) but they certainly have a performance cost. Inserting and ignoring the resulting error on duplicate key is possible, but it tends to result in pretty spammy logs.

Personally, I'd do what's effectively do a bulk upsert, where I'd do something like this:

CREATE TEMPORARY TABLE staging_table(...)
COPY TO staging_table(...) FROM ... (or do bulk inserts)
LOCK TABLE real_table IN EXCLUSIVE MODE
INSERT INTO real_table(...) SELECT col1,col2,col3,... FROM staging_table WHERE NOT EXISTS (SELECT 1 FROM real_table WHERE real_table.primarykey = staging_table.primarykey)
COMMIT

You can do an UPDATE before the INSERT if you want, updating existing records with new values.

The LOCK TABLE is necessary to prevent concurrent updates failing with errors. You can still SELECT from the table during the update.

If you EXPLAIN the plan for your INSERT you should see that it's been transformed into a JOIN when you're inserting larger numbers of records.

Postgresql – Insertion of data violates the foreign key constraint

In Postgres, there are two ways to circumvent the chicken-and-egg problem of inserting into tables with circular references.

a) Defer one of the constraints, either declaring it as DEFERRABLE INITIALLY DEFERRED when created or as DEFERRABLE as created and then defer it when needed.

For an example of using deferrable constraints, see my answer in this question:
Implementation of a many-to-many relationship with total participation constraints in SQL.

b) Use a single statement to insert into the two (or more) tables, using modifying CTEs. Example for your case:

with
  insert_emp as
  ( insert into employee 
        (Fname, Minit, Lname, Ssn, Bdate, Address, Sex, Salary, Super_ssn, Dno)
    values
        ('James', 'E', 'Borg', '888665555', '1937-11-10', 'asda', 'M', 55000, NULL, 1)
    returning 
        *
  ), 
  insert_dep
  ( insert into sp0090.department 
        (Dname, Dnumber, Mgr_ssn, Mgr_start_date) 
    values 
        ('Headquarter', 1, '888665555', '1981-06-19')
    returning 
        *
  ) 
select *
from insert_emp as e
  full join insert_dep as d
  on  e.dno = d.dnumber
  and e.ssn = d.mgr_ssn ;

Best Answer

Related Solutions

Postgresql – Most efficient to prevent duplicates

Postgresql – Insertion of data violates the foreign key constraint

Related Question