Postgresql – Joining ~ 40 normalized tables for 200 columns to display detailed user profile. Can this be optimized

database-designpostgresql

A profile can have many profile types. That profile can have many fields, which are all tied to lookup tables. Some fields can have many selections.

profile -> profile_date -> one relationship status, one zodiac, many languages, many countries traveled, many cuisines, many activities, ...

profile -> profile_network -> one company, one school, ...

profile -> ...

A lookup table for the fields looks like this:

id  type
1   ♈ ARIES
2   ♉ TAURUS
3   ♊ GEMINI
4   ♋ CANCER
5   ♌ LEO
6   ♍ VIRGO
7   ♎ LIBRA
8   ♏ SCORPIO
9   ♐ SAGITTARIUS
10  ♑ CAPRICORN
11  ♒ AQUARIUS
12  ♓ PISCES

If the field is a single selection, we join the lookup table and we're done.

-> profile_date (zodiac_type_id) -> zodiac_type.id

If the field is a multiple selection, there's a join table in between for the many-to-many.

-> profile_date -> (profile_date_id) profile_date_languages (language_type_id) -> language_type.id

In addition, I am joining other things like this to get all of their photos:

profile -> personas -> photos

In addition, the query also filters for things like persons who like each other, etc.

The query seems to start performing poorly with just a low number of profiles, like less than 10. If a single person has an obnoxious profile where they check all the multiple selections e.g. speak every language, been to every country, it grinds the database to crippled performance for everyone if their mega profile ends up being fetched.

That's the idea. You can see more of the fields here:

Sample query:
https://pastebin.com/wug86Wrr

Sample query plan:
http://www.sharecsv.com/s/a6333a7f610e049dd81ebcfc19a4c02f/temp_file_limit_exceeded.csv

Users will always be filtering by at least profile.birth_date (age), profile.radius (geolocation). Paid users can filter by any field. 🙂

PostgreSQL 9.6

Some ideas I had were:

  1. Optimize this query (I am not sure what can be done, am mostly wondering if I'm just doing something really stupid/unreasonable)

  2. Right now I am joining many tables per card, so the initial load is slower, but the data is all there if the user decides to explore and things look instant to them. I could instead fetch the minimum preview for the cards (join few tables per card), then only join the full profile (1 card filtered by id, many/the rest of the joins) if the user clicks on the preview to see the detail view (but I feel this is not the best user experience because the user has to now wait for the load every time they click a profile which is more noticeable). I am not sure this would even help because users can also filter by these fields, so I believe I would need to join them beforehand.

  3. Use ElasticSearch (I feel like I wouldn't need this until many more profiles?) I've done something similar to this with Algolia and it handled it like a champ, but I imagine this can just be done with PostgreSQL.

Best Answer

Option 4 - correct your schema design and stop using 'IDs' or whatever other surrogate key, the zodiac table is NOT a lookup table. it's a relation that holds all zodiac signs, and the way you tell the zodiac signs apart is their name, therefore, the key should be:

CREATE TABLE Zodiacs (Zodiac VARCHAR(11) PRIMARY KEY);

Use the Zodiac as the foreign key in your profile table, and now you don't need a join at all. Do that to all the other one-to many as well, like relationship status, company, school etc. and you just saved yourself most of the joins. For the many-to-many, like languages and countries, so the same, and you will now need only one join instead of two for each relation. Give it a try and see how all your queries magically shrink by 90%, and perform orders of magnitude better. you will also explicitly maintain uniqueness and prevent logical duplicates, and you will also see how filters perform orders of magnitude better as the optimizer will be able to correctly evaluate distribution of the actual value, not some random ID from another table.

BTW - there are no fields in a SQL database, there are columns and rows only. These are the counterparts of Tuples and attributes in the relational model

HTH