PostgreSQL – Integer Arrays and Index for Equality

btreegist-indexpostgresql

I have a huge list of integer arrays (300,000,000 records) stored in Postgres 9.2 DB. I want to efficiently search these records for an exact match (equality only). I have heard of the intarray module and the corresponding gist-gin indexes. I would like to ask the following questions:

  • Does PostgreSQL uses a hash function for checking equality of integer arrays or does it perform a brute-force algorithm comparing one-by-one the elements of the array?
  • If PostgreSQL uses a hash function, is there some PostgreSQL function code to actually get the hash function result for a particular array?
  • Which index will be best for such a task? B-tree, or the gist – gin indexes provided by the intarray module? The dataset will be static, i.e, once all records are inserted there will no more insertions. So, building index / updating index time is not important to me.

Best Answer

1) as you already have discovered, you can't use b-tree as the index size is bigger than the page size

2) given:

As a rule of thumb, a GIN index is faster to search than a GiST index, but slower to build or update; so GIN is better suited for static data and GiST for often-updated data.

You would have to use GIN. And no, GIN doesn't use hash functions nor a brute-force algorithm. It's a reverse index:

A GIN index stores a set of (key, posting list) pairs, where a posting list is a set of row IDs in which the key occurs. The same row ID can appear in multiple posting lists, since an item can contain more than one key. Each key value is stored only once, so a GIN index is very compact for cases where the same key appears many times.

Internally, a GIN index contains a B-tree index constructed over keys, where each key is an element of one or more indexed items (a member of an array, for example)