PostgreSQL – Better Design for Table with Too Many Nullable Columns

database-designpostgresqlpostgresql-11

I have a user table with around 30 columns. Most of these columns are almost always null unless user decides to share additional info about themselves. For example, weight, height, religion, is_smoking, is_drinking, star_sign, school, job, etc.

In my app, I have on option to filter users based on these values. I might be adding one or two columns like these in future.

I was just wondering if there is a better design approach for situations like these. For some reason, it doesn't feel right to have all these nullable columns in the same table with login credentials.

^{PS: I am using PostgreSQL 11.4 as the database management system.}

Best Answer

You could have a table

CREATE TABLE user_info (
   id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
   user_id bigint NOT NULL REFERENCES users,
   property text NOT NULL,
   value TEXT NOT NULL
);

and store the additional properties there. That would be the relational way of doing it.

In recent PostgreSQL versions you could also add a jsonb column for all such information.

With a GIN index on it you can filter efficiently as long as you are looking for certain keys and their value. But you won't be able to do advanced things like substring or similarity matches that way.

It is a matter of taste which one you prefer.

Related Solutions

MySQL Database Design – Replacing Too Many Tables with Too Many Columns

Solution 3 is the best solution provided that you move the deviceId from the variables table into the data table.

The reason why it is better is that when you need to add a variable to one device you can do this without changing your devices and data tables in solution 2. If you want to query the data in the solution 2 way then you can create views that look like the devices and data tables in solution 2.

Schema for 1 User to 0..N Apps

If the ultimate number of apps possible is very small (4-5 would be the limit for me, I think), what you've got can work.

If, at some point, there will be other data to store for each user and each app, and that data would vary considerably from one app to another, then it may be the easiest structure to work with.

If there will never be a need to store additional data per user per app; or, if that data would be basically the same from one app to another, then you could have an app table to store unique information about the apps (not tied to individual users), and have a user_app table to link users to their apps, and store any common data that would apply to all apps (first sign-in, last sign-in, whatever you might need, if anything).

CREATE TABLE app (app_id int PRIMARY KEY, app_name varchar(128), ...);
CREATE TABLE user_app
     ( user_app_id int PRIMARY KEY
      ,app_id -- foreign key to table app
      ,user_id -- foreign key to table user
      ... - other columns, if needed
     );

This is probably the best solution if the number of apps grows large (say, if you're tracking users and their apps in Google Play, rather than users and 4 in-house apps, where you're not expecting more in the foreseeable future).

Update - as noted by ypercubeᵀᴹ, user_app should have a unique index on (user_id, app_id), assuming each user can only have one account per app.

If it was possible for a single user to have multiple accounts on a given application for some reason, then you would want to include the account_id as a column. Then, you would not want the (user_id, app_id) unique index; instead, you would want a unique index on (app_id, account_id) (assuming that duplicate account IDs are fine across different applications; otherwise, the unique index should just be on account_id).

Best Answer

Related Solutions

MySQL Database Design – Replacing Too Many Tables with Too Many Columns

Schema for 1 User to 0..N Apps

Related Question