Best practices for integer key types – mixed or all the same integer type

database-designdatatypesprimary-key

I am designing a postgres database for scientific data. It has about 30-40 tables, many of which are lookup tables with less than 10 records. The bulk of the data are concentrated in just a few tables with ~ 1,000,000 records that reference the lookup tables with foreign keys. There will be web user interface that accesses the data via a GraphQL API.

I always use the same INT data type for all primary and foreign key types. Is there merit in optimizing the primary keys on the lookup tables as SMALLINT to conserve space on the big tables (and perhaps a modest overall performance)? Or is it best to just keep all key columns the same data type to reduce complexity and confusion for humans and software accessing the data?

(The lookup data could change over time so I can't use ENUM.)

Best Answer

It's a little subjective, and there's certainly nothing wrong with reduced complexity by defaulting to INT (since it covers most use cases and there's almost no tangible difference in performance) but generally if your tables are going to be so small in record count, using SMALLINT is a good idea from a best practice standpoint. SMALLINT gets you +-32767 rows (for a total of 65,534). So it is definitely sufficient for your small tables.

When deciding on the size for a datatype, generally planning for anywhere between twice as much up to a magnitude more of the max amount values you think is possible is a decent rule of thumb, depending on how confident you are in that rough estimation.

At the end of the day, if 25% of your tables have 1,000,000 rows in them, switching one of the smaller tables to SMALLINT instead of INT will only save you about 20 MB of space overall (which is peanuts). If you did it for all your small tables, then at most you're saving about half a gigabyte of space. There's probably better optimizations to go after currently and you can let sleeping dogs lie. Moving forward if you want to take SMALLINT into better consideration on new tables, it's not a bad idea.