Postgresql – Nullable One To Many w/Join Table – Indexing

database-designormpostgresql

I have a question regarding database design, I am working with PostgreSQL using Hibernate for ORM, the design I inherited includes the following tables

users

user_id (pk)
user_name
user_dob …

sessions

session_id (pk)
session_date
session_duration …

user_sessions

user_id (fk)
session_id (fk)

As you can guess one user can have many sessions, the reason (I believe) the join table was used is because a session can be added before the user exists (and it is linked to the user after they signup), thus avoiding a nullable field in the sessions table…fine.

But I have recently been looking at creating some multi-column indexes to speed up certain queries and have realised that I cannot include the user in this index as the column is in a different table.

As I am fairly new to DB design I am just wondering if the above design is correct? Or am I in fact better off using a nullable FK in the session table to allow me to index the user relationship?

Best Answer

If a session can only belong to a single user at any given time, the more appropriate model would be without a join table (user_sessions). What's the problem with column user_id being NULL?

If you need it to be NOT NULL and you don't always know the user at creation time, just insert a user "unknown" (with user_id 0 in my example). Can serve as DEFAULT value, too ...

CREATE TABLE sessions
  session_id   serial PRIMARY KEY
 ,user_id      int NOT NULL DEFAULT 0 REFERENCES users(user_id)
 ,session_date timestamp
 ...
)

Note on indexing strategy

Postgres can combine multiple indices with excellent speed. Typically, in your original scenario, with a nested loop:

nested loop join: The right relation is scanned once for every row found in the left relation. This strategy is easy to implement but can be very time consuming. (However, if the right relation can be scanned with an index scan, this can be a good strategy. It is possible to use values from the current row of the left relation as keys for the index scan of the right.)

Say, you have these indices:

sesssions(session_id)
user_sessions(session_id, user_id)

Both should be covered by the pk constraints in your original model automatically. However, the second one could go either way (user_id, session_id) or (session_id, user_id) and that matters:
Is a composite index also good for queries on the first field?

For best performance in a query like:

SELECT u.user_name, s.session_date 
FROM   users         u
JOIN   user_sessions us USING (user_id)
JOIN   sessions      s  USING (session_id)
WHERE  u.user_name = 'foo'

You would need the index (or pk constraint) to be

user_sessions(user_id, session_id)

Or have an additional index with user_id as first column.