Postgresql – Is it a good idea to create an index just for new data to increase index creation performance

database-designindex-tuningpostgresqlpostgresql-9.5

I have a table with about 350 M rows with 45 GB, it has around 10 columns ,simplified version below

CREATE TABLE PAYMENT (
    ID NUMERIC(20) PRIMARY KEY,
    CREATION TIMESTAMP,
    VALUE DECIMAL(10, 2),
    USER_ID NUMERIC(20)
    ....
)

Once known that

  • Table have a really high concurrency
  • It receives about 3M INSERTS and 5M UPDATES per day, update uses the PK
  • By company policies reasons
    • I must keep old data to grant history
    • I can't partitioning that table
    • I'm really going to have this one table with all data and there is not chance to change that
  • I have to create an index for (CREATION, USER_ID) columns

Is it a good idea to create an index excluding old data to get the index created faster, avoiding table locks and save storage knowing that my queries will just hit the new data?

CREATE INDEX ON PAYMENT(CREATION, USER_ID) WHERE CREATION > TIMESTAMP '2018-03-25 00:00:00';

Best Answer

if you create a partial index then your queries will have also include the condition used to create the index. if they do not the index will be of no use.

when creating the index there will probably need to be a scan of the entire table your only win is a smaller index size.

What index will help queries most depends on what information you want from the table. are you sure that there are enough payments with the exact same creation that including userid as the second term in the index is justified? bear in mind that timestamp is 8 bytes while numeric(20) is 16 bytes, so an index on timestamp only will also be more compact.

An index on payment (userid, creation) will be more useful in the case where you want you find the most recent record, or the payment history, for any particular userid.