Postgresql – Surrogate key resolution, or what am I missing in the Surrogate v Natural argument

database-designpostgresqlrelational-theorysurrogate-key

I've read a lot from both camps in the surrogate vs. natural key debate. What I am struggling to understand, and what seems to be glossed over, especially in explaining how surrogate keys are more efficient, is how do referencing tables resolve a surrogate key?

Say, I am using an ORM in an application, and have a userID in an application object. now we want to insert into a SelectedTask table and a SubTask table that are defined as follows (ignore the awkwardness of the real word example):

CREATE TABLE SelectedTask(
userID INT References User(userID)
taskNumber INT,
name TEXT ,
jobSiteID INT References Site(jobSiteID),
PRIMARY KEY (???)
);

CREATE TABLE SubTask(
assignee INT References User(assignee),
subTaskID INT References Subtasks(subTaskID),
timeAllowed INT,
reportTo INT References Group(reportTo),
-- how do we reference SelectedTask?
);

If we use a surrogate key for the SelectedTask table, how do we use that in the SubTask table? If we use natural keys, the primary key could simply be a compound primary key of userId and taskNumber, and we could insert into both at the same time assuming we have programmatically gathered all of the pieces.

Best Answer

A proper Postgres schema for your example could look like this:

CREATE TABLE selected_task(
  selected_task_id serial PRIMARY KEY  -- surrogate PK
, user_id          int REFERENCES users(user_id)
, task_number      int
, name             text
, jobsite_id       int REFERENCES site(jobsite_id)
);

CREATE TABLE sub_task(
  sub_task_id      serial PRIMARY KEY
, selected_task_id int REFERENCES selected_task  -- this is how
, assignee         int REFERENCES users(assignee)
, time_allowed     int
, report_to        int REFERENCES groups(report_to)
);

Use a serial column as surrogate PK.
You can't use reserved words like user or group as identifier.
Some other improvements. Detailed instructions here:
- How to implement a many-to-many relationship in PostgreSQL?

`INSERT` in both tables at the same time?

Use a data-modifying CTE with the RETURNING clause, or two INSERT statements in a transaction with lastval() (which is the Postgres equivalent for SCOPE_IDENTITY() in SQL server @Aaron mentioned in his comment):

Related answer with code example for either method.

Inserting into related tables

Best Answer

INSERT in both tables at the same time?

Related Solutions

Should every table have a single-field surrogate/artificial primary key

Is a surrogate key better than a natural key in this case

Related Question

`INSERT` in both tables at the same time?