PostgreSQL – Best Index for Lots of Duplicate Values

clustered-indexindexperformancepostgresqlpostgresql-9.6query-performance

Let's make a few assumptions:

I have table that looks like this:

 a | b
---+---
 a | -1
 a | 17
  ...
 a | 21
 c | 17
 c | -3
  ...
 c | 22

Facts about my set:

Size of the whole table is ~ 10¹⁰ rows.
I have ~ 100k rows with value a in column a, similar for other values (e.g. c).
That means ~ 100k distinct values in column 'a'.
Most of my queries will read all or most of the values for a given value in a, e.g. select sum(b) from t where a = 'c'.
The table is written in such a way that consecutive values are physically close (either it's written in order, or we assume CLUSTER was used on that table and column a).
The table is rarely if ever updated, we're only concerned about read speed.
The table is relatively narrow (say ~25 bytes per tuple, + 23 bytes overhead).

Now the question is, what kind of index should I be using? My understanding is:

BTree My issue here is that the BTree index will be huge since as far as I know it will store duplicate values (it has to, since it can't assume the table is physically sorted). If the BTree is huge, I end up having to read both the index and the parts of the table that the index points to. (We can use fillfactor = 100 to decrease the size of the index a bit.)
BRIN My understanding is that I can have a small index here at the expense of reading useless pages. Using a small pages_per_range means that the index is bigger (which is a problem with BRIN since I need to read the whole index), having a big pages_per_range means that I'll read a lot of useless pages. Is there a magic formula to find a good value of pages_per_range that takes into account those trade-offs?
GIN/GiST Not sure those are relevant here since they're mostly used for full-text search, but I also hear that they're good at dealing with duplicate keys. Would either a GIN or GiST index help here?

Another question is, will Postgres use the fact that a table is CLUSTERed (assuming no updates) in the query planner (e.g. by binary searching for the relevant start/end pages)? Somewhat related, can I just store all my columns in a BTree and drop the table altogether (or achieve something equivalent, I believe those are clustered indices in SQL server)? Is there some hybrid BTree/BRIN index that would help here?

I'd rather avoid using arrays to store my values since my query will end up less readable that way (I understand this would reduce the cost of the 23 bytes per tuple overhead by reducing the number of tuples).

Best Answer

BTree

My issue here is that the BTree index will be huge since afaict it will store duplicate values (it has too, since it can't assume the table is physically sorted). If the BTree is huge I end up having to read both the index and the parts of the table that the index points too...

Not necessarily — Having a btree index that is 'covering' will be the fastest read time, and if that is all you want (ie if you can afford the extra storage), then it is your best bet.

BRIN

My understanding is that I can have a small index here at the expense of reading useless pages. Using a small pages_per_range means that the index is bigger (which is a problem with BRIN since I need to read the whole index), having a big pages_per_range means that I'll read a lot of useless pages.

If you can't afford the storage overhead of a covering btree index, BRIN is ideal for you, because you have clustering already in place (this is crucial for BRIN to be useful). BRIN indexes are tiny, so all the pages are likely to be in memory if you choose a suitable value of pages_per_range.

Is there a magic formula to find a good value of pages_per_range that takes into account those trade offs?

No magic formula, but start with pages_per_range somewhat less than the average size (in pages) occupied by the average a value. You are probably trying to minimize: (number of BRIN pages scanned)+(number of heap pages scanned) for a typical query. Look for Heap Blocks: lossy=n in the execution plan for pages_per_range=1 and compare with other values for pages_per_range — i.e. see how many unnecessary heap blocks are being scanned.

GIN/GiST

Not sure those are relevant here since they're mostly used for full text search, but I also hear that they're good at dealing with duplicate keys. Would either a GIN/GiST index help here?

GIN may be worth considering, but probably not GiST — however if the natural clustering really is good, then BRIN will probably be a better bet.

Here is a sample comparison between the different index types for dummy data a bit like yours:

table and indexes:

create table foo(a,b,c) as
select *, lpad('',20)
from (select chr(g) a from generate_series(97,122) g) a
     cross join (select generate_series(1,100000) b) b
order by a;
create index foo_btree_covering on foo(a,b);
create index foo_btree on foo(a);
create index foo_gin on foo using gin(a);
create index foo_brin_2 on foo using brin(a) with (pages_per_range=2);
create index foo_brin_4 on foo using brin(a) with (pages_per_range=4);
vacuum analyze;

relation sizes:

select relname "name", pg_size_pretty(siz) "size", siz/8192 pages, (select count(*) from foo)*8192/siz "rows/page"
from( select relname, pg_relation_size(C.oid) siz
      from pg_class c join pg_namespace n on n.oid = c.relnamespace
      where nspname = current_schema ) z;

name               | size    | pages | rows/page
:----------------- | :------ | ----: | --------:
foo                | 149 MB  | 19118 |       135
foo_btree_covering | 56 MB   |  7132 |       364
foo_btree          | 56 MB   |  7132 |       364
foo_gin            | 2928 kB |   366 |      7103
foo_brin_2         | 264 kB  |    33 |     78787
foo_brin_4         | 136 kB  |    17 |    152941

covering btree:

explain analyze select sum(b) from foo where a='a';

| QUERY PLAN                                                                                                                                      |
| :---------------------------------------------------------------------------------------------------------------------------------------------- |
| Aggregate  (cost=3282.57..3282.58 rows=1 width=8) (actual time=45.942..45.942 rows=1 loops=1)                                                   |
|   ->  Index Only Scan using foo_btree_covering on foo  (cost=0.43..3017.80 rows=105907 width=4) (actual time=0.038..27.286 rows=100000 loops=1) |
|         Index Cond: (a = 'a'::text)                                                                                                             |
|         Heap Fetches: 0                                                                                                                         |
| Planning time: 0.099 ms                                                                                                                         |
| Execution time: 45.968 ms                                                                                                                       |

plain btree:

drop index foo_btree_covering;
explain analyze select sum(b) from foo where a='a';

| QUERY PLAN                                                                                                                        |
| :-------------------------------------------------------------------------------------------------------------------------------- |
| Aggregate  (cost=4064.57..4064.58 rows=1 width=8) (actual time=54.242..54.242 rows=1 loops=1)                                     |
|   ->  Index Scan using foo_btree on foo  (cost=0.43..3799.80 rows=105907 width=4) (actual time=0.037..33.084 rows=100000 loops=1) |
|         Index Cond: (a = 'a'::text)                                                                                               |
| Planning time: 0.135 ms                                                                                                           |
| Execution time: 54.280 ms                                                                                                         |

BRIN pages_per_range=4:

drop index foo_btree;
explain analyze select sum(b) from foo where a='a';

| QUERY PLAN                                                                                                                        |
| :-------------------------------------------------------------------------------------------------------------------------------- |
| Aggregate  (cost=21595.38..21595.39 rows=1 width=8) (actual time=52.455..52.455 rows=1 loops=1)                                   |
|   ->  Bitmap Heap Scan on foo  (cost=888.78..21330.61 rows=105907 width=4) (actual time=2.738..31.967 rows=100000 loops=1)        |
|         Recheck Cond: (a = 'a'::text)                                                                                             |
|         Rows Removed by Index Recheck: 96                                                                                         |
|         Heap Blocks: lossy=736                                                                                                    |
|         ->  Bitmap Index Scan on foo_brin_4  (cost=0.00..862.30 rows=105907 width=0) (actual time=2.720..2.720 rows=7360 loops=1) |
|               Index Cond: (a = 'a'::text)                                                                                         |
| Planning time: 0.101 ms                                                                                                           |
| Execution time: 52.501 ms                                                                                                         |

BRIN pages_per_range=2:

drop index foo_brin_4;
explain analyze select sum(b) from foo where a='a';

| QUERY PLAN                                                                                                                        |
| :-------------------------------------------------------------------------------------------------------------------------------- |
| Aggregate  (cost=21659.38..21659.39 rows=1 width=8) (actual time=53.971..53.971 rows=1 loops=1)                                   |
|   ->  Bitmap Heap Scan on foo  (cost=952.78..21394.61 rows=105907 width=4) (actual time=5.286..33.492 rows=100000 loops=1)        |
|         Recheck Cond: (a = 'a'::text)                                                                                             |
|         Rows Removed by Index Recheck: 96                                                                                         |
|         Heap Blocks: lossy=736                                                                                                    |
|         ->  Bitmap Index Scan on foo_brin_2  (cost=0.00..926.30 rows=105907 width=0) (actual time=5.275..5.275 rows=7360 loops=1) |
|               Index Cond: (a = 'a'::text)                                                                                         |
| Planning time: 0.095 ms                                                                                                           |
| Execution time: 54.016 ms                                                                                                         |

GIN:

drop index foo_brin_2;
explain analyze select sum(b) from foo where a='a';

| QUERY PLAN                                                                                                                         |
| :--------------------------------------------------------------------------------------------------------------------------------- |
| Aggregate  (cost=21687.38..21687.39 rows=1 width=8) (actual time=55.331..55.331 rows=1 loops=1)                                    |
|   ->  Bitmap Heap Scan on foo  (cost=980.78..21422.61 rows=105907 width=4) (actual time=12.377..33.956 rows=100000 loops=1)        |
|         Recheck Cond: (a = 'a'::text)                                                                                              |
|         Heap Blocks: exact=736                                                                                                     |
|         ->  Bitmap Index Scan on foo_gin  (cost=0.00..954.30 rows=105907 width=0) (actual time=12.271..12.271 rows=100000 loops=1) |
|               Index Cond: (a = 'a'::text)                                                                                          |
| Planning time: 0.118 ms                                                                                                            |
| Execution time: 55.366 ms                                                                                                          |

dbfiddle here

Related Solutions

Postgresql – Does spgist extension in PostgreSQL 9.2.3 support float or int array

No, by default SP-GiST only supports these operators over point ranges box and text

Name            Indexed Data Type   Indexable Operators
kd_point_ops    point               << <@ <^ >> >^ ~=
quad_point_ops  point               << <@ <^ >> >^ ~=
range_ops       any range type      && &< &> -|- << <@ = >> @>
box_ops         box                 << &< && &> >> ~= @> <@ &<| <<| |>> |&>
text_ops        text                < <= = > >= ~<=~ ~<~ ~>=~ ~>~

That's not to say that it can not work over float or int[] but it may be awkward unless they have any special internal designation you can exploit for partitioning. That is to say, if your int[] and array are likely to produce balanced trees, or are equally distributed there is no point whatsoever (afaik).

Postgresql table with one integer column, sorted index, with duplicate primary key

I think you're asking how to impliment a solution you'e already decided on for a more general problem you don't describe. If you were to outline the actual problem that this is supposed to solve you might get better suggestions about how to solve it.

Working within the very limited information provided:

Update: I found your other question, which you really should've linked to. You seem to be trying to roll your own message queue. Don't do that. Read these:

Have I convinced you that you shouldn't try to do this yourself yet? Look into:

RabbitMQ
ZeroMQ
Job::Machine
ActiveMQ
PGQ
Celery

Some of what you want isn't available in current PostgreSQL versions. For example:

INSERTs should not do any query in that table or any kind of unique index. INSERTs shall just locate the best page for the main file/main btree for this table and just insert the row in between two other rows, ordered by ID.`

That'd require an index-organized table, which PostgreSQL doesn't have yet. The closest you'll get would be a one-column table with a PRIMARY KEY. With regular VACUUM on PostgreSQL 9.2 you'd be able to use index-only scans to access it most of the time.

As for allowing duplicates, you don't really seem to want to permit them at all, you're just saying you want to work around concurrency issues by temporarily permitting them.

You can remove such duplicates during INSERT so the table its self doesn't need to permit them. However, that'll cause issues with:

INSERTs will happen in bulk (about 1000 per transaction) and must not fail, expect for disc full, etc. There must not be any chance for deadlocks.

... assuming that those inserts occur concurrently from multiple transactions. You'll have races between the checks for existence and the insert that can cause insert batches to fail and have to be re-tried.

I suspect that your best bet is to have a one-column table without a PRIMARY KEY. Just create an ordinary b-tree index on it, and leave the table without a PRIMARY KEY. Since it genuinely has no primary key (the only column may have duplicates) this is entirely reasonable.

(BTW, given that SQL is supposedly all about sets, it astounds me how awful it is at "add this entry to the set if not already present").

Best Answer

BTree

BRIN

GIN/GiST

Related Solutions

Postgresql – Does spgist extension in PostgreSQL 9.2.3 support float or int array

Postgresql table with one integer column, sorted index, with duplicate primary key

Related Question