Postgresql – limit an index to only contain the newest relevant row

postgresql

I have a table in PostgreSQL 9.2 that records login sessions called user_login_session. This table contains a user_id, start_timestamp and end_timestamp columns, amongst others. This table has started growing as my system gains traction, and I can't purge old records for now.

I've defined an index on this table but I'm concerned about its growth over time. Is it possible to define an index which only keeps a handle to the newest login session row per user ID? Older rows are not accessed operationally.

Best Answer

This is a classic case for partitioning.

When you insert a new record, have a trigger delete the old record from the user_login_session_latest sub-table and insert it into the user_login_session_history sub-table. Have both of them inherit from an empty top-level user_login_session table so you can still query the two transparently. You will need an additional trigger-maintained column like is_latest to partition on, since you can't do constraint exclusion on a condition that refers to other rows.

A partial index will work, but not as well as you might hope; it's hard to write a partial index based on the concept of "latest" (you need an app to set a flag), and it might not boost things as much as you expect. The partial index points at a page, but if that page contains 10 old records and one new record you might not gain that much; you'd need to regularly CLUSTER the table to see much benefit. Partitioning will be a much better option.

Alternately, you can maintain a materialized view of the "latest" users in another table using triggers or application logic. I'd generally prefer to use partitioning in preference to a materialized view for this.

Remember to tune autovacuum to vacuum your tables quite frequently if there's lots of updating/deleting going on.