Postgresql – Index method for very few updates and many inserts

gin-indexgist-indexindexindex-tuningpostgresql

I am using Postgresql 9.1 with *pg_trgm* extension. I need to create an index on a text-based field. I do not need full-text searches, I use ILIKE queries to make my searches.

I will use pg_trgm but do not have much experience with gin and gist indexes. I will have many INSERT statements (~15000 daily) and very few UPDATE statements (may be 1 or 2 in a week).

What will be the index update overhead for a gin index on a such table? Or is gist more suitable?

Best Answer

The manual for the pg_trgm module has some advice for your question here:

As a rule of thumb, a GIN index is faster to search than a GiST index, but slower to build or update; so GIN is better suited for static data and GiST for often-updated data.

The FASTUPDATE feature of GIN indexes (introduced in Postgres 8.4, ON by default) should be interesting for you. INSERTs (and UPDATEs) are stored in a pending list and integrated into the index in later bulk operations by VACUUM. Makes those operations a lot faster, since direct GIN maintenance tends to be expensive, especially for columns with many indexable elements, i.e. many words in your case.

More basic advice on GIN vs GiST in the manual here.
In particular, if your rows are big and you insert in bulks, you may want to raise the work_mem setting.

For very big loads (large parts of the table), it might pay to drop the index and recreate it afterwards.
Although, 15k rows spread out over a day, shouldn't be a problem at all.

Finally, there is a whole chapter on performance of GIN and GiST in the Full Text Search section. Applies generally. Read this.

The gist of it (no pun intended): GIN is considerably bigger and more expensive to maintain, but faster for most lookups (special exceptions apply for pg_trgm). Since the addition of FASTUPDATE writes are only moderately more expensive.