Postgresql – When normalizing a relational database, should multiple “type” tables be combined into one

database-designnormalizationpostgresqlrelational-theory

Before making substantial changes to the content, functionality and design of my website (I'll be refactoring everything, almost a complete rewrite), I'm setting up a new PostgreSQL database to hold all the data previously in a MySQL database. I also have a heavily-customized WordPress blog that I'm going to move over to this website, effectively merging the two sites. Apart from the front-facing content for the website, there is also a lot of other information that I store in the database for my own use, such as client information, sales history, invoices, event income, etc. I'm taking a lot of time thinking things through carefully and trying to re-design the database properly with both data integrity and performance in mind.

The front-facing website will have many different "content" types: pages, events/gigs, multimedia samples, sheet music/charts (some of which will be available for sale in PDF format on the site), blog articles, etc.

One of the changes I've made is to the way I store venues/places. I used to have a "venues" table that referenced a "venue type" table as well as a "cities" table, which referenced a "states/provinces" table, which referenced a "countries" table. Most of this information was used by the "events", but then I added client information which needed to reference the cities table. Further still, all of my blog articles are geotagged, so they'll need to reference these places too. Since all the tables had a very similar schema, what I've done is create a new "places" table with "id" and "parent_id" columns so that places can be stored in a hierarchal fashion. I also have a "places_type" table so that I can enforce that the parent_id for each row has to be of a higher hierarchal type than the place itself. (So for example, a city can have a state or country as its parent type, but a city can't have a restaurant as its parent type.)

In doing this, I've created a lot of other tables with the exact same schema (id, parent_id, name) to hold "types" of other content in a hierarchy. Genres, payment methods, chart types, event types, media types, invoice types, etc.

Since all the tables have the exact same schema, would it be better to combine all these tables into a single "types" table and add a "category" field to the table?

I think it would be much easier to manage a single "types" table than eleven similar tables. I've calculated that this "types" table would hold only about 60 rows or so, for now.

I'm wondering what would be the best practice for this sort of thing in terms of database design? The data will be retrieved far more frequently than it will be updated/added.

At the same time, since this one "types" table would hold all the types for everything, it will need to be joined (via a view) many times to many tables. For example, to retrieve event information, I already join an "events" table to the "places", "artists" and "media" tables, but would also need to join these tables to the "types" table to get event types, place types, artist types and media types. So would it be best to keep the type tables separate, or combine them into one? Why?

Best Answer

I would go for separate tables for each type. With so few rows performance will not be a consideration either way. Individual tables will only have a page or two each. Similarly for the combined table.

Either way you will have to join a "type" table to complete your views. Inserts are infrequent at this level of abstraction so locking is unlikely to be an issue either way.

For me, though, having different tables keeps logically distinct items separated, which is a cleaner design.

There will be many different tables which will have a relationship to a type table of some sort. If there are many different type tables the foreign key relationships are self-documenting and enforcable through DRI. With one combined table it is difficult to use DRI to say "Venue.TypeID comes from Type.TypeID but only if the Type.Category = 'V'", for example. The complementary side of this relationship - "Type.TypeID with Type.Category = 'V' can only be used in the Venue table" - is also not supported in any RDBMS's DRI that I know of.