If a session can only belong to a single user at any given time, the more appropriate model would be without a join table (user_sessions
). What's the problem with column user_id
being NULL
?
If you need it to be NOT NULL
and you don't always know the user at creation time, just insert a user "unknown" (with user_id 0
in my example). Can serve as DEFAULT
value, too ...
CREATE TABLE sessions
session_id serial PRIMARY KEY
,user_id int NOT NULL DEFAULT 0 REFERENCES users(user_id)
,session_date timestamp
...
)
Note on indexing strategy
Postgres can combine multiple indices with excellent speed. Typically, in your original scenario, with a nested loop:
nested loop join: The right relation is scanned once for every row found in the left relation. This strategy is easy to implement but can
be very time consuming. (However, if the right relation can be scanned
with an index scan, this can be a good strategy. It is possible to use
values from the current row of the left relation as keys for the index
scan of the right.)
Say, you have these indices:
sesssions(session_id)
user_sessions(session_id, user_id)
Both should be covered by the pk constraints in your original model automatically. However, the second one could go either way (user_id, session_id)
or (session_id, user_id)
and that matters:
Is a composite index also good for queries on the first field?
For best performance in a query like:
SELECT u.user_name, s.session_date
FROM users u
JOIN user_sessions us USING (user_id)
JOIN sessions s USING (session_id)
WHERE u.user_name = 'foo'
You would need the index (or pk constraint) to be
user_sessions(user_id, session_id)
Or have an additional index with user_id
as first column.
The thing about indexes are that they are excellent tools for speeding up queries, but they bring along overhead as well.
- If you choose to index every column in a table that is involved in a query, you have chosen to store considerably more data. This has an effect on the storage required and in the overhead of maintaining all those indexes. (Or maintaining all columns in a very wide index.)
- In past experience, having many indexes, especially several indexes with similar distribution statistics, can make it easier for the server to select a poor index and thus a poor plan.
My experience shows that indexing just about everything will hurt your performance overall, for the reasons mentioned above.
Both table A
and table B
should only contain the indexes that would be of best use for your queries.
Of course, your first indexing decisions may need to be refined over time as you learn more about your database's behavior. Do not be afraid to add additional indexes when needed. And do not be afraid to drop an index that is not being useful.
Best Answer
That's not what indexes are for, you need a check constraint for that,
For the "at most one can be not null" you can use a little trick: casting a boolean expression to an integer yields
0
or1
. If you add up the values for ais not null
condition, the rule "at most one can be not null" translates to "the sum of those must always be 1":The above can be simplified by using the
num_nonnulls()
function:This also covers the "not all three can be null" requirement.