PostgreSQL – Should Large Tables Be Split for Optimization?

database-designoptimizationperformancepostgresqlquery-performance

I'm designing a PostgreSQL database that will contain information about photos uploaded by my users. All users will have at least one main photo and optionally one or more public and/or private photos. My first schema is as follows:

user
----
id (PK)

photo
-----
id (PK)
user_id   (FK to user)
photo_id  (unique identifier such as "aK1q9")
type      ("main" or "other")
access    ("public" or "private")

The query that will get run the most is:

SELECT p.photo_id FROM photo p INNER JOIN user u ON p.user_id = u.id WHERE p.type = 'main' AND u.id = (some user id);

The next most popular query will be:

SELECT p.photo_id FROM photo p INNER JOIN user u ON p.user_id = u.id WHERE p.type = 'other' AND p.access = 'public' AND u.id = (some user id);

The problem I forsee is that the Photo table will become extremely large over time as it will contain all public and private photos uploaded by all users. Since my most popular query will only be looking for the main photo IDs, would it make more sense to break my photo table up into three tables?

main_photo
----------
id (PK)
user_id   (FK to user but in a one-to-one relationship to user)
photo_id

other_public_photo
------------------
id (PK)
user_id   (FK to user)
photo_id

other_private_photo
-------------------
id (PK)
user_id   (FK to user)
photo_id

I would think this latter schema would be preferable because 1) each photo's type and access information is made explicit by where it's stored thus eliminating additional ANDs in my queries; and 2) my queries would run faster since they would be run against one of the three smaller tables rather than one huge table. Which is the optimal approach from a performance standpoint?

Thanks.

Best Answer

I think, this is a design question rather database technology. So, I provided the answer with what DB technology am current working on. But, you prefer to have it explained using pseudo, so here you go with another answer. I could go and edit the other one, am intentionally leaving as it is so that if SQL Server guy visits this page then he may appreciate it, we never know!

Table Name: User
Reason: Requirement is to always associate the primary photo with the user it make sense to store just the reference of Primary Photo Id along with User record.

Columns
----------------
UserId (PK)
PrimaryPhotoId (FK to Photo table)

Table Name: Photo
Description: It would be lot easier to manage just single table. It is definitely possible to extend columns in this table in future just in-case if you want to store more information like tool-tip, Alt text, image description, etc.. The other tables doesn't get disturbed and requires absolutely no changes.

Columns
----------------
PhotoId (PK)

Table Name: UserPhoto
Description: This table provides you the flexibility to have one to many relationship. It also identifies the record whether every Photo is private or public, using the flag (IsPrivate column). If you prefer to restrict that every user shouldn't associate the same Photo twice then the proposed composite Unique Key make sense, otherwise you need to extend the unique key but the table design should remain the same.

Columns
----------------
UserPhotoId (PK, AutoIncrement)
UserId (FK to User table)       - Composite Unique Key
PhotoId (FK to Photo table)     - Composite Unique Key
IsPrivate (Bit Data type to store 0 or 1. 0 value represents for 'Public', 1 value represents for 'Private' photo)