PostgreSQL Index – Can It Refer to Only One or a Subset of Values?

indexindex-tuningpostgresql

I am using Postgres. The table looks something like this:

tran_id SERIAL PRIMARY KEY,
account_id integer NOT NULL CONSTRAINT fk_account_id REFERENCES base.account(account_id) ON DELETE CASCADE,
tran_type base.my_enum not null,
tran_note varchar(64) not null,
process_id integer not null,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP

I am interested in being able to quickly find only the rows where process_id is 0. That value means that the row is pending for some kind of processing, non-0 rows are basically old log that I don't want archived because the application also needs to find the number of transactions for a certain account. So it is convenient to have the processed and pending data in the same table. I am not, however, interested in being able to quickly find rows where that field is anything other than 0 and I don't want to use resources on indexing those other values.

This table, in one implementation, has close to 200K accounts, each account around 200 transactions on average so about 40M rows, give or take. At any given time, there are maybe a few dozen new transactions to process, i.e. WHERE process_id = 0. Once it is processed, the process_id is updated to something other than 0.

I will never be searching for process_id that is other than 0. So I would be interested in having an index that points to only rows where that value is 0 and not necessarily bother with other values so that the index is more compact and faster. Is this possible?

Best Answer

You can create a partial index that includes all columns of the table:

CREATE UNIQUE INDEX not_processed
    ON transactions 
        (tran_id, tran_type, tran_note, created_at)
    WHERE process_id = 0 ;