PostgreSQL Insert Condition – Matching Two Columns with Separate Table

conditionforeign keypostgresqlunique-constraint

I have three tables: a users table, a books table, and a chapters table.

Each book has an id called identifier_id (for good business reasons that I cannot work around). This id is guaranteed to be unique per-user, i.e. no user will ever have two books with the same id. It is not, however, guaranteed to be globally unique. Each book also has a foreign key user_id.

Each chapter has user_id and book_id foreign keys – the book_id foreign key points to identifier_id on the books table.

When I add chapters to a book, I want to add a database-level contraint that a chapter insert will be rejected unless the user_id of the chapter matches the user_id of the book AND the book_id of the chapter matches a book_id in the books table.

Because the book_id is not globally unique, I cannot put a unique index on it. Is there a way to do this?

Best Answer

You can create a unique compound index

 CREATE UNIQUE INDEX "book_id_user" ON books (identifier_id,user_id);

And then you can create a F.K. constraint

 ALTER TABLE chapters 
   ADD CONSTRAINT "FK_chapter_book" 
     FOREIGN KEY (book_id,user_id)
       REFERENCES books (identifier_id,user_id);

If you have any bad data that will need to be corrected before the constraint can be created

Related Solutions

Postgresql – Unique across tables

To enforce unique email addresses, I would remove all competing email columns and store them in one central email table for all active emails. And another table for deleted emails:

CREATE TABLE users (
  user_id  serial PRIMARY KEY
, username text UNIQUE NOT NULL
, email    text UNIQUE -- FK added below  -- can also be NOT NULL
);

CREATE TABLE email (
  email    text PRIMARY KEY
, user_id  int NOT NULL REFERENCES users ON DELETE CASCADE
, UNIQUE (user_id, email)  -- seems redundant, but required for FK
);

ALTER TABLE users ADD CONSTRAINT users_primary_email_fkey
FOREIGN KEY (user_id, email) REFERENCES email (user_id, email);

CREATE TABLE email_deleted (
  email_id serial PRIMARY KEY
, email    text NOT NULL  -- not necessarily unique
, user_id  int NOT NULL REFERENCES users ON DELETE CASCADE
);

This way:

Active emails are unique, enforced by the PK constraint of email.
Each user can have any number of active and deleted emails, but ...
Each user can only have one primary email.
Every email is always owned by one user and is deleted with the user.
To soft-delete an email (without losing it and its affiliation to its user, move the row from email to email_deleted.
- The primary email of a user cannot be deleted this way, because the primary email must not be deleted.
I designed the FK constraint users_primary_email_fkey to span (user_id, email), which seems redundant at first. But this way the primary email can only be an email that is actually owned by the same user.
Due to the default MATCH SIMPLE behavior of FK constraints, you can still enter a user without primary email, because the FK constraint is not enforced if any of the columns is null.
Details:
- Two-column foreign key constraint only when third column is NOT NULL

The UNIQUE constraint on users.email is redundant for this solution, but it may be useful for other reasons. The automatically created index should come in handy (for instance for the last query in this answer).

The only thing that's not enforced this way is that every user has a primary email. You can do this, too. Add NOT NULL constraint to users.email

UNIQUE (user_id, email) is required for the FK constraint:

How does PostgreSQL enforce the UNIQUE constraint / what type of index does it use?

You have doubtless spotted the circular reference in the above model. Contrary to what one might expect, this just works.

As long as users.email can be NULL, it's trivial:

INSERT user without email.
INSERT email referencing the owning user_id.
UPDATE user to set it's primary email if applicable.

It even works with users.email set to NOT NULL. You have to insert user and email at the same time though:

WITH u AS (
   INSERT INTO users(username, email)
   VALUES ('user_foo', 'foo@mail.com')
   RETURNING email, user_id
   )
INSERT INTO email (email, user_id)
SELECT email, user_id
FROM   u;

IMMEDIATE FK constraints (the default) are checked at the end of each statement. The above is one statement. That's why it works where two separate statements would fail. Detailed explanation:

How to deal with mutually recursive inserts

To get all emails of a user as array, with the primary email first:

SELECT u.*, e.emails
FROM   users u
     , LATERAL (
      SELECT ARRAY (
      SELECT email
      FROM   email
      WHERE  user_id = u.user_id
      ORDER  BY (email <> u.email)  -- sort primary email first
      ) AS emails
   ) e
WHERE  user_id = 1;

You could create a VIEW with this for ease of use.
LATERAL requires Postgres 9.3. use a correlated subquery in pg 9.2:

SELECT *, ARRAY (
             SELECT email
             FROM   email
             WHERE  user_id = u.user_id
             ORDER  BY (email <> u.email)  -- sort primary email first
             ) AS emails
FROM   users u
WHERE  user_id = 1;

To soft-delete an email:

WITH del AS (
   DELETE FROM email
   WHERE  email = 'spam@mail.com'
   RETURNING email, user_id
   )
INSERT INTO email_deleted (email, user_id)
SELECT email, user_id FROM del;

To soft-delete the primary email of a given user:

WITH upd AS (
   UPDATE users u
   SET    email = NULL
   FROM   (SELECT user_id, email FROM users WHERE user_id = 123 FOR UPDATE) old
   WHERE  old.user_id = u.user_id
   AND    u.user_id = 1
   RETURNING old.*
   )
,    del AS (
   DELETE FROM email
   USING  upd
   WHERE  email.email = upd.email
   )
INSERT INTO email_deleted (email, user_id)
SELECT email, user_id FROM upd;

Details:

Return pre-UPDATE Column Values Using SQL Only - PostgreSQL Version

Quick test for all of the above: SQL Fiddle.

Why Should a Key Be Made Explicit?

You are obviously suggesting that CONSTRAINTs in a database should be enforced by the application(s) that/which access that database?

There are many reasons why this is a bad (bad, bad...) idea.

1) If you are building a "roll-your-own" constraint "engine" (i.e. within your application code), then you are merely emulating what Oracle/SQL Server/MySQL/PostgreSQL/<.whoever...> have spent years writing. Their CONSTRAINT code has been tested over those years by literally millions of end-users.

2) With all due respect to you and your team, you are not going to get it right even in a matter of years - from here, MySQL code alone cost 40 Million dollars. And MySQL is the cheapest of the 3 servers above, and they don't even implement CHECK CONSTRAINTs. Obviously, getting R.I. (Referential Integrity) completely right is difficult.

I used to frequent the Oracle forums and I can't tell you the number of times that some poor manager/programmer has had a project thrust upon him where the genius who had his job before had the "bright" idea of doing what you suggest.

Jonathan Lewis (he wrote a 550 page book on the fundamentals of the Oracle optimiser) gives as no. 2 of his Design Disasters in another book ("Tales of the Oak Table" - the Oak Table is a group of Oracle experts) is

We will check data integrity at the application level instead of taking advantage of Oracle's constraint checking abilities.

3) Even if by some miracle you can properly implement RI, you will have to completely reimplement it time and again for every application that touches that database - and if your data is important, then new applications will. Choosing this as a paradigm will lead to you and your fellow programmers (not to mention support staff and sales) to a life of constant fire-fighting and misery.

You can read more about why implementing data CONSTRAINTs at the application level is nothing short of madness here, here and here.

To specifically answer your question:

Just why are they declared at all? It seems very helpful, but is it actually necessary to have a database that functions

The reason that KEYs (either PRIMARY, FOREIGN, UNIQUE or just ordinary INDEXes) are declared is that, while it is not strictly necessary for a database to have them for it function, it is absolutely necessary for them to be declared for it to function well.

Best Answer

Related Solutions

Postgresql – Unique across tables

Why Should a Key Be Made Explicit?

Related Question