PostgreSQL doesn't have INSERT ... IGNORE
so that isn't an option. You can use upsert-like operations (search for "postgresql upsert) but they certainly have a performance cost. Inserting and ignoring the resulting error on duplicate key is possible, but it tends to result in pretty spammy logs.
Personally, I'd do what's effectively do a bulk upsert, where I'd do something like this:
CREATE TEMPORARY TABLE staging_table(...)
COPY TO staging_table(...) FROM ...
(or do bulk inserts)
LOCK TABLE real_table IN EXCLUSIVE MODE
INSERT INTO real_table(...) SELECT col1,col2,col3,... FROM staging_table WHERE NOT EXISTS (SELECT 1 FROM real_table WHERE real_table.primarykey = staging_table.primarykey)
COMMIT
You can do an UPDATE
before the INSERT
if you want, updating existing records with new values.
The LOCK TABLE
is necessary to prevent concurrent updates failing with errors. You can still SELECT
from the table during the update.
If you EXPLAIN
the plan for your INSERT
you should see that it's been transformed into a JOIN
when you're inserting larger numbers of records.
In Postgres, there are two ways to circumvent the chicken-and-egg problem of inserting into tables with circular references.
a) Defer one of the constraints, either declaring it as DEFERRABLE INITIALLY DEFERRED
when created or as DEFERRABLE
as created and then defer it when needed.
For an example of using deferrable constraints, see my answer in this question:
Implementation of a many-to-many relationship with total participation constraints in SQL.
b) Use a single statement to insert into the two (or more) tables, using modifying CTEs. Example for your case:
with
insert_emp as
( insert into employee
(Fname, Minit, Lname, Ssn, Bdate, Address, Sex, Salary, Super_ssn, Dno)
values
('James', 'E', 'Borg', '888665555', '1937-11-10', 'asda', 'M', 55000, NULL, 1)
returning
*
),
insert_dep
( insert into sp0090.department
(Dname, Dnumber, Mgr_ssn, Mgr_start_date)
values
('Headquarter', 1, '888665555', '1981-06-19')
returning
*
)
select *
from insert_emp as e
full join insert_dep as d
on e.dno = d.dnumber
and e.ssn = d.mgr_ssn ;
Best Answer
I do not think your question is really database agnostic. The right answer could depend on implementation details, which may vary from vendor to vendor and change with the next version. I would test under concurrency before choosing any approach on any RDBMS.
Right now, on SQL Server 2008 R2, I am using the following:
Low concurrency and low amount of modifications. To save a single row I serialize using sp_getapplock and use MERGE. I stress test under high concurrency to verify that it works.
Higher concurrency and/or volume. To avoid concurrency and boost performance, I do not save one row at a time. I accumulate changes on my app server, and use TVPs to save batches. Still, to avoid concurrency related issues, I serialize using sp_getapplock before the MERGE. Again, I stress test under high concurrency to verify that it works.
As a result, we have good performance and zero concurrency related issues in production: no deadlocks, no PK/unique constraints violations etc.