I host a file sharing site. The idea is to take a short string (e.g. abcd
) and do one of the following:
- serve an image
- display some text
- redirect to another url
Historically, I've hosted this site using non-relational databases, where I'd store it as something like:
{
short: "abcd",
type: "url", // or image or text
// type specific data, e.g. for a url
url: "https://google.com"
// or for a file
store: "/cool_store",
filesize: 10,
etc.
}
I've redesigned the platform and decided to use SQLite / Postgres instead. I'm not that used to schema design and my first design of (not actual code, heavily simplified):
CREATE TABLE Items (
short varchar(64),
store varchar(255),
filesize int
);
Which worked great for the image hosting side of things. Now I want to expand to the other types and can't decide the best approach. My two main ideas are:
- Replace the singular table with one 'reference' table and separate 'item' tables for each different type, e.g.
References { id: 'abcd', type: 'url', ref: 1 }
(where the ref represents the primary key of a different table).Items { id: 1, store: "/cool_store", filesize: 10 }
.Urls { id: 2, url: "https://google.com" }
, etc. - Store all the properties required on a single table with a large proportion of the columns being null. In total we require ~50 columns, with ~30 being null at any point in time.
The database is exceptionally read-heavy, maybe 99.99% reads.
Best Answer
You should group the fields closest related to each other in the same Table and create additional Tables for the remaining fields as they relate to each other as well. This is called database normalization and is effective in reducing data redundancy, improving maintainability, and helps optimize performance. Your additional Tables (kind of reference Tables such as your first option describes) will store the primary key(s) of the Table(s) they relate to. This is known as a foreign key relationship. Your second design option is called denormalization and is not recommended in most cases (there are some specific cases where some denormalization is helpful).
I don't know your data so I'll use an example for a Car Dealer to convey the idea of normalization. Image a Car Dealer business who sells different brands of
Cars
across multipleLocations
(notice how my objects are already taking shape of the Tables I need). So I'd expect their to be aLocations
table, with columns regarding eachLocation
, such as itsAddress
,PhoneNumber
and maybeManagerName
, in addition to its primary key columnLocationId
. I'd also expect aCars
table with columns likeManufacturer
,Model
,ManufacturedYear
, with a primary key column ofCarId
.Cool, so far so good. But what if we wanted to know more information about the
Managers
of eachLocation
and more information about theManufacturers
of eachCar
. To store that in the database, if we added those fields to theLocations
andCars
tables respectively would be bad design because that's denormalized. Information about aManager
object doesn't make a lot of sense in aLocations
table, same for information about aManufacturer
in the Car Dealership'sCars
table. So we should create two more tables,Managers
andManufacturers
.Managers
would now store theManagerName
field instead of theLocations
table. (Realistically you'd like storeManagerName
in two fields, one forFirstName
and one forLastName
but this is just a basic example, so bear with me :).Managers
would also store things like the Manager'sAddress
andPhoneNunber
, and maybe even theirYearlySalary
, all things that wouldn't make sense in aLocations
table. Finally, the primary key would beManagerId
which is what theLocations
table would store as a foreign key reference to theManagers
table, and this is how you relate the two tables.Same design as above for the
Manufacturers
table. It would have columns likeManufacturerName
,Address
,PhoneNumber
, maybe a field calledNumberOfCarModels
etc, and finally a primary key ofManufacturerId
which would be a foreign key in theCars
table.This is loosely a first level of normalization and hopefully conveys how to structure things. Essentially every main object gets it's own table. You can actually further normalize this (though not always necessary) by having more generic tables for your objects. For example a
Contacts
table could store all theAddresses
andPhoneNumbers
of theLocations
,Managers
, andManufacturers
tables in one place (with a column to denote theContactType
) instead of on each individual table. And theManagers
table would likely be anEmployees
table with a fieldIsManager
to denote which ones areManagers
. This next level of normalization reduces redundancy and lightens the other tables even more so (but again it isn't a necessity and just depends on your situation).Ideally, the performance benefits of normalization come from the fact that when you query a table, you're not getting back 30 extra columns of data you don't need and take longer to load off disk, etc. By designing your tables centralized around your primary objects, and storing only what's truly relevant to those objects in those tables, breaking out the additional fields into their own object tables, you will maximize the performance of querying those tables.