Postgresql – Using table inheritance instead of mapping tables

database-designinheritancepostgresqlsubtypes

This seems like a pretty common scenario: several types that all compose the same child type.

This could typically look like so:

-- 'name' is unique per parent record
CREATE TABLE sometype (
  sometype_id serial PRIMARY KEY,
  name        text
);

CREATE TABLE foo (foo_id serial PK);
CREATE TABLE bar (bar_id serial PK);

CREATE TABLE foo_sometype (
  foo_id      int4,
  sometype_id int4
);

CREATE TABLE bar_sometype (
  bar_id      int4,
  sometype_id int4
);

Which is fine, but cumbersome to query. I'm thinking this could be cleaner:

-- 'name' is unique per parent record
CREATE TABLE sometype (
  name        text
);

CREATE TABLE foo_sometype (
  foo_id      int4
) INHERITS(sometype);

CREATE TABLE bar_sometype (
  bar_id      int4,
) INHERITS(sometype);

What I like about this:

simple to join (with USING)
no need to add a surrogate key to 'sometype', it is explicitly a component of 'foo' and 'bar'

Seems like an atypical use of inheritance, though.

Any reason not to do this?

On the "deficiencies" of inheritance

Note that the standard caveats to Pg inheritance are only relevant when table inheritance is used to directly model class inheritance, which is not what I am doing here. In fact, for this to work I need inheritance to behave the way it does.

I almost wish they called it something other than "inheritance", since the behavior is quite logical, and the "shortcomings" are only relevant to one use-case.

Benefits over manually duplicated table structure

As Evan points out, I could just manually create 'foo_sometype' and 'bar_sometype' that look exactly like what I describe, but I think there are a couple of significant benefits to the inherited structure:

The inherits relationship explicitly defines 'foo_sometype' and 'bar_sometype' as being of the same type, not just two tables that happen to have the same columns.
Making future schema changes through the parent table lessens the chance of accidental divergences (with a little work, this could actually be enforced).
More importantly, client code can be generated against the parent table, and applied to the children with only a change in the table name, with (again) confidence that the table structure is enforced.

So, as a contrived example, Foo and Bar could have a HasSomeTypeList trait, which abstracts all 'sometype' operations, and knows that both tables can be mapped to the SomeType class.

Representing the Foo/Bar relationship, whether it's modeled as a trait or as inheritance, is the ultimate goal here.

Incidentally, to the naive user/query-writer – who is not expected to make schema changes – these two ways will look the same.

Best Answer

Inheritance is one of those features that I wouldn't touch. AFAIK, it's used internally for replication and partitioning in some capacity. I'm not sure if it was even designed with the intent to be used by the end-user.

Concrete Technical Drawbacks

Drawbacks on UNIQUE and REFERENCES

The docs covers some of the drawbacks in the CAVEAT section (below is important).

If we declared parent.name to be UNIQUE or a PRIMARY KEY, this would not stop the child table from having rows with names duplicating rows in parent. And those duplicate rows would by default show up in queries from parent. In fact, by default child would have no unique constraint at all, and so could contain multiple rows with the same name. You could add a unique constraint to child, but this would not prevent duplication compared to parent.
Similarly, if we were to specify that parent.name REFERENCES some other table, this constraint would not automatically propagate to child. In this case you could work around it by manually adding the same REFERENCES constraint to child.
Specifying that another table's column REFERENCES parent(name) would allow the other table to contain parent names, but not child names. There is no good workaround for this case.

Slow progress developing INHERITs

These deficiencies were first mentioned in the docs to 7.3 released in 1996 though they existed since inheritance was implemented

This deficiency will probably be fixed in some future release.

And the only change was to make the deficiencies more explicit and verbose in the docs to 8.0 released in 2010.

These deficiencies will probably be fixed in some future release, but in the meantime considerable care is needed in deciding whether inheritance is useful for your problem.

Good luck waiting for that some future release. And, some of the things you talk about features just aren't unique to composition,

Saving a "key" is moot

no surrogate key on 'sometype', it's explicitly a composition

How is that different from making sometype an attribute list, and linking directly to it?

CREATE TABLE sometype (sometype_name text PRIMARY KEY);
CREATE TABLE foo (foo_id serial PRIMARY KEY);
CREATE TABLE foo_sometype (
  foo_id int REFERENCES foo,
  sometype_name text REFERENCES sometype,
  PRIMARY KEY ( foo_id, sometype_name )
);

Now you don't even have to join foo_sometype to sometype to get sometype.sometype_name.

Table Partitioning

All of those problems aside, it gets even worse with the upcoming PostgreSQL 10 release of table partitioning

Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed

So you want inheritance? Forgo partitioning, which actually has real planner advantages.

ALTER TABLE

Alas, ALTER TABLE has quite a few drawbacks listed in its notes as well,

If a table has any descendant tables, it is not permitted to add, rename, or change the type of a column, or rename an inherited constraint in the parent table without doing the same to the descendants. That is, ALTER TABLE ONLY will be rejected. This ensures that the descendants always have columns matching the parent. [...] A recursive DROP COLUMN operation will remove a descendant table's column only if the descendant does not inherit that column from any other parents and never had an independent definition of the column. A nonrecursive DROP COLUMN (i.e., ALTER TABLE ONLY ... DROP COLUMN) never removes any descendant columns, but instead marks them as independently defined rather than inherited. [...] The TRIGGER, CLUSTER, OWNER, and TABLESPACE actions never recurse to descendant tables; that is, they always act as though ONLY were specified. Adding a constraint recurses only for CHECK constraints that are not marked NO INHERIT.

Conclusion

I don't think many people use inheritance. I've never seen it in the wild. Inheritance in the db adds to the learning curve and some features are just better left alone. You don't have to find an application for them.

You may find this post on Stack Overflow useful, "When to use inherited tables in PostgreSQL?".

Related Solutions

Sql-server – How to represent class table inheritance (current DBMS-specific way please)

Ultimately, I implemented ypercube's suggestion from comments:

The "type" column can be defined as a computed (but constant) column in SQL-Server. I think it has to be PERSISTED though so it can participate in the Foreign Key constraint.

This worked well both for performance and compatibility with with my tools (Entity Framework <= 6.0):

CREATE TABLE [dbo].[Account](
    [Id] [int] IDENTITY(1000,1) NOT NULL PRIMARY KEY CLUSTERED,
    [CommunityRole] [int] NOT NULL,
    [FirstName] [nvarchar](50) NOT NULL,
    [LastName] [nvarchar](50) NOT NULL,
CONSTRAINT [UX_Derived_Relation] UNIQUE ([Id], [CommunityRole]))

CREATE TABLE [dbo].[Recruiter](
    [Id] [int] NOT NULL PRIMARY KEY CLUSTERED,
    [CommunityRole]  AS ((1)) PERSISTED NOT NULL,
    [RecruiterSpecificValue]  [int] NOT NULL,
FOREIGN KEY ([Id], [CommunityRole]) REFERENCES Account([Id], [CommunityRole]))

CREATE TABLE [dbo].[Candidate](
    [Id] [int] NOT NULL PRIMARY KEY CLUSTERED,
    [CommunityRole]  AS ((2)) PERSISTED NOT NULL,
    [CandidateSpecificValue]  [int] NOT NULL,
FOREIGN KEY ([Id], [CommunityRole]) REFERENCES Account([Id], [CommunityRole]))

This mapped well to my implementation of multiple discrete account types, Recruiter and Candidate, on my Job Board.

Proper implementation of child (derived) table PK/FK during single and multiple inheritance hard coded generalization physical data model

I solved this (for single inheritance) like so:

each extension of a class inherits, as its primary key the base classes primary key and always targets with its fk the immediate base class:

class A
class B extends A
class C extends B

A(id,p1)
B(id,p1,p2)
B(id,p1,p2,p3)

Table A:
id: primary key
p1: sth

Table B:
id: primary key + foreign key -> A.id
p2: sth

Table C:
id: primary key + foreign key -> B.id
p3: sth

this would reqiure a join to completely retrieve B or C.

alternatively you can have each extension mirror the properties of the base classes:

Table A:
id: primary key
p1: sth

Table B:
id: primary key + foreign key -> A.id
p1: sth
p2: sth

Table C:
id: primary key + foreign key -> B.id
p1: sth
p2: sth
p3: sth

here you can retrieve any class without joins but data is duplicated and has to be kept consistent.

which solution you can use strongly depends on how you want to retrieve the data and if you can live with the overhead of the join in version 1 or overhead of writing data multiple times in version 2.

I also added a meta-column to the table to know if a row does have extensions or not which is useful for a some queries:

Table A:
id: primary key
p1: sth
class: A,B,C

Table B:
id: primary key
p2: sth
class: B,C

Table C:
id: primary key
p2: sth
class: C