Postgresql – Inheritance and foreign keys in Postgres

foreign keyinheritancepostgresql

This is about using inheritance and foreign keys in Postgresql databases.

Consider the following simplistic example whose structure is based on what I am building at the moment (but the specifics were contrived in realtime just for this question, so please excuse any shortcomings!):

Parent table1: Person (columns: ID, Name). Child tables: Man, Woman.

Parent table2: Relationship (columns: ID, Partner1 and Partner2). Child
tables: Gay, Lesbian.

Each table has a primary key set on the column ID.

The table Relationship has two foreign keys set on columns Partner1 and Partner2 which reference the table Person (column ID).

The (inherited) tables Gay and Lesbian also need to have foreign keys set on their Partner1 and Partner2 columns. The question is whether these foreign keys should reference the parent table Person, or whether they should reference (as appropriate) the child tables Man and Woman.

The questions comes up because, as stated in the manual for v9.6, section 5.9.1 :

Caveats

… A serious limitation of the inheritance feature is that indexes
(including unique constraints) and foreign key constraints only apply
to single tables, not to their inheritance children …… These
deficiencies will probably be fixed in some future release …

To me, this (the fact that foreign keys do not apply across inherited tables and must be done separately) is a feature, and not a limitation. And a very useful feature too, as can be seen in the case of the aforementioned example:
When the child table Lesbian references the child table Woman (instead of the parent table Person), it is very easy to prevent errors of the sort where there's a lesbian relationship between two men!

Of course, constraints can very well be imposed to achieve this, but it seems to me as though what I wrote above is a more elegant way of doing things. But I am also concerned about what the manual states towards the end – that the development team sees this as a problem, and might get rid of it. So I am also worried if my design would totally break after a future upgrade.

Any tips on the design above and suggestions for alternate ways would be most appreciated.

I would also be very grateful if there's someone from the Postgres dev team lurking around here, and is kind enough to comment.

Best Answer

Parent table1: Person (columns: ID, Name). Child tables: Man, Woman.
Parent table2: Relationship (columns: ID, Partner1 and Partner2). Child tables: Gay, Lesbian.

I'm staunchly against the inheritance modeled being used by end users, but even here this isn't a valid use case of it. Your gender isn't a child table. It's an attribute on the table. The same can be said of your relationship.

Just add a column, gender and add another column relationship

To me, this (the fact that foreign keys do not apply across inherited tables and must be done separately) is a feature, and not a limitation. And a very useful feature too, as can be seen in the case of the aforementioned example: When the child table Lesbian references the child table Woman (instead of the parent table Person), it is very easy to prevent errors of the sort where there's a lesbian relationship between two men!

I think is going to devolve into a political question but if people's gender can't change (horrible assumption) then the relationship is known between the two of them by the gender of the people. Why would you even want an attribute on the relationship table. The class gay or lesbian would be inferred from the genders of the participants. Yes if you put it there you can have an error where there is a lesbian relationship between two men, but that's an error introduced because your data should be inferred.

Which of these two seems more logical?

f(p1,p2) = class of relationship
f(p1,p2,relationship_class) = class of relationship

Index

This index of yours looks good for it:

"toys_201512_new_container_id_created_at" btree (container_id, created_at)

If you have many NULL values you might even make that a partial index by appending WHERE source IS NOT NULL, making the index look even better for the Postgres query planner.

Statistics for query planning

Make sure the query planner can work with valid statistics. The numbers in your EXPLAIN output show quite a mismatch:

Foreign Scan on toys_201512_new  (cost=100.00..1143585.38 rows=2831 width=15)
                           (actual time=113.419..1488.445 rows=76593 loops=1)

27x as many rows as Postgres expected were actually returned . The manual:

Running ANALYZE on the foreign table is the way to update the local statistics; this will perform a scan of the remote table and then calculate and store statistics just as though the table were local. Keeping local statistics can be a useful way to reduce per-query planning overhead for a remote table — but if the remote table is frequently updated, the local statistics will soon be obsolete.

Since accessing foreign tables is potentially expensive / delicate, this does not happen automatically. Foreign tables are not covered by autovacuum. The manual:

Foreign tables are analyzed only when explicitly selected.

If the remote table changes a lot, you might want to activate use_remote_estimate. The manual:

This option, which can be specified for a foreign table or a foreign server, controls whether postgres_fdw issues remote EXPLAIN commands to obtain cost estimates. A setting for a foreign table overrides any setting for its server, but only for that table. The default is false.

Finally, test to see what is actually sent to the foreign server:

The query that is actually sent to the remote server for execution can be examined using EXPLAIN VERBOSE.

Query

Your query decluttered and formatted, with one minor improvement:

SELECT source, global_action, paid, organic, device
     , count(*) AS count, sum(price) AS sum
FROM   toys
WHERE  container_id = 857
AND    created_at >= '2015-12-02 05:00:00'
AND    created_at <  '2015-12-30 05:00:00'
AND    created_at <= '2015-12-30 04:59:59.999999'
AND    source IS NOT NULL
GROUP  BY source, global_action, paid, organic, device;

Simpler, cleaner and also matches your CHECK constraint better and avoids possible corner case problems.

Postgresql inheritance based database design

Whether you are better off with single-table-inheritance or class-table-inheritance really depends on the particulars of your case. The performance advantage can go either way. The difficulties presented by having lots of NULLS in a table range from the trivial to the overwhelming. It depends on what your data looks like, and what you intend to do.

Kudos for figuring out that the problem is basically due to a mismatch between object modeling and relational modeling.

PS if you use UserID as the primary key on both Parent and Babysitter, and also declare it as a foreign key, you'll get some benefits, at the cost of a little programming when you go to insert new users. This technique is called shared-primary-key, and it's also presented over on SO.

Best Answer

Related Solutions

Postgresql – Postgres 9.5 foreign table inheritance not using indexes

Index

Statistics for query planning

Query

Postgresql inheritance based database design

Related Question