Postgresql – How to handle a single `reference` to multiple different types of data

postgresqlschema

I host a file sharing site. The idea is to take a short string (e.g. abcd) and do one of the following:

serve an image
display some text
redirect to another url

Historically, I've hosted this site using non-relational databases, where I'd store it as something like:

{
  short: "abcd",
  type: "url", // or image or text
  // type specific data, e.g. for a url
  url: "https://google.com"
  // or for a file
  store: "/cool_store",
  filesize: 10,
  etc.
}

I've redesigned the platform and decided to use SQLite / Postgres instead. I'm not that used to schema design and my first design of (not actual code, heavily simplified):

CREATE TABLE Items (
    short varchar(64),
    store varchar(255),
    filesize int
);

Which worked great for the image hosting side of things. Now I want to expand to the other types and can't decide the best approach. My two main ideas are:

Replace the singular table with one 'reference' table and separate 'item' tables for each different type, e.g. References { id: 'abcd', type: 'url', ref: 1 } (where the ref represents the primary key of a different table). Items { id: 1, store: "/cool_store", filesize: 10 }. Urls { id: 2, url: "https://google.com" }, etc.
Store all the properties required on a single table with a large proportion of the columns being null. In total we require ~50 columns, with ~30 being null at any point in time.

The database is exceptionally read-heavy, maybe 99.99% reads.

Best Answer

You should group the fields closest related to each other in the same Table and create additional Tables for the remaining fields as they relate to each other as well. This is called database normalization and is effective in reducing data redundancy, improving maintainability, and helps optimize performance. Your additional Tables (kind of reference Tables such as your first option describes) will store the primary key(s) of the Table(s) they relate to. This is known as a foreign key relationship. Your second design option is called denormalization and is not recommended in most cases (there are some specific cases where some denormalization is helpful).

I don't know your data so I'll use an example for a Car Dealer to convey the idea of normalization. Image a Car Dealer business who sells different brands of Cars across multiple Locations (notice how my objects are already taking shape of the Tables I need). So I'd expect their to be a Locations table, with columns regarding each Location, such as its Address, PhoneNumber and maybe ManagerName, in addition to its primary key column LocationId. I'd also expect a Cars table with columns like Manufacturer, Model, ManufacturedYear, with a primary key column of CarId.

Cool, so far so good. But what if we wanted to know more information about the Managers of each Location and more information about the Manufacturers of each Car. To store that in the database, if we added those fields to the Locations and Cars tables respectively would be bad design because that's denormalized. Information about a Manager object doesn't make a lot of sense in a Locations table, same for information about a Manufacturer in the Car Dealership's Cars table. So we should create two more tables, Managers and Manufacturers.

Managers would now store the ManagerName field instead of the Locations table. (Realistically you'd like store ManagerName in two fields, one for FirstName and one for LastName but this is just a basic example, so bear with me :). Managers would also store things like the Manager's Address and PhoneNunber, and maybe even their YearlySalary, all things that wouldn't make sense in a Locations table. Finally, the primary key would be ManagerId which is what the Locations table would store as a foreign key reference to the Managers table, and this is how you relate the two tables.

Same design as above for the Manufacturers table. It would have columns like ManufacturerName, Address, PhoneNumber, maybe a field called NumberOfCarModels etc, and finally a primary key of ManufacturerId which would be a foreign key in the Cars table.

This is loosely a first level of normalization and hopefully conveys how to structure things. Essentially every main object gets it's own table. You can actually further normalize this (though not always necessary) by having more generic tables for your objects. For example a Contacts table could store all the Addresses and PhoneNumbers of the Locations, Managers, and Manufacturers tables in one place (with a column to denote the ContactType) instead of on each individual table. And the Managers table would likely be an Employees table with a field IsManager to denote which ones are Managers. This next level of normalization reduces redundancy and lightens the other tables even more so (but again it isn't a necessity and just depends on your situation).

Ideally, the performance benefits of normalization come from the fact that when you query a table, you're not getting back 30 extra columns of data you don't need and take longer to load off disk, etc. By designing your tables centralized around your primary objects, and storing only what's truly relevant to those objects in those tables, breaking out the additional fields into their own object tables, you will maximize the performance of querying those tables.

Related Solutions

MySQL Schema Design – Reference Different Data in Tables

is this what business process of yours?

enter image description here

that must be

document table :

create table document (
  `id`        int unsigned auto_increment,
  `title`     varchar(128) not null,

  primary key(id)
);

block table

create table block (
  `id`             int unsigned auto_increment,
  `id_document`    int // foreign key to document
  primary key(id)
);

block_html (one-to-one relation to block table)

create table block_html (
  `id`             int unsigned auto_increment, // foreign key to block table and make it primary
  `type`      varchar(32) not null,
  `contents`  TEXT not null,

  primary key(id)
);

block image (one-to-one relation to block table)

create table block_image (
  `id`        int unsigned auto_increment,// foreign key to block table and make it primary
  `type`      varchar(32) not null,
  `url`       varchar(255) not null,
  primary key(id)
);

Database Schema – Designing Objects with Multiple Foreign Key Tables

Simply include in your model a choice table for each type of choice that your Your field choice_type_id is essentially a kind of partitioning attribute. It indicates what type of child table is applicable.

So far you have three tables: MULTIPLE_CHOICE, BINARY_CHOICE, and FREE_TEXT (although you might consider binary choice to be a special case of multiple choice).

Each of these tables is a child of your QUESTION table, i.e. they all have a question_id foreign key. The multiple choice table will have multiple records pointing to QUESTION whereas the other tables will have one - depending on how you model binary choices.

Note that this physical arrangement could be flattened out using a view to look like the one big table with sparse columns that you want to avoid. The one big flat (sparse) table might be handy for reporting, but since you're worried about scale you might want to avoid a sparse table for physical storage.

Your user answer tables would be children of the various choice tables that intersect the USER and the appropriate choice, since each question is answered by multiple users.

For more about this approach, see my answers to this question and this question here on DBA.SE.

Best Answer

Related Solutions

MySQL Schema Design – Reference Different Data in Tables

Database Schema – Designing Objects with Multiple Foreign Key Tables

Related Question