Postgresql – Why not ERROR: index row size xxxx exceeds maximum 2712 for index “foo”

indexlimitspostgresqlsize;

We have repeatedly seen failing attempts to index columns with values exceeding a maximum size. Postgres 10 has this kind of error message for it:

ERROR:  index row size xxxx exceeds maximum 2712 for index "foo_idx"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
       Consider a function index of an MD5 hash of the value, or use full text indexing.

Examples:

Etc.

Now, a_horse_with_no_name demonstrated a case with much larger text values (10000 characters) that still seems to work with a UNIQUE index in Postgres 9.6. Quoting his test case:

create table tbl (col text);
create unique index on tbl (col);

insert into tbl
values (rpad(md5(random()::text), 10000, md5(random()::text)));

select length(val) from x;  -- 10000

No error, and the column value indeed tested with a length of 10000 characters.

Have there been recent changes or how is this possible?

Best Answer

Short answer: compression.

The data type text allows (lossless!) compression and storage out of line by default:

SELECT typstorage FROM pg_type WHERE typname = 'text';  -- 'x'

The manual about pg_type.typstorage:

p: Value must always be stored plain.
e: Value can be stored in a “secondary” relation (if relation has one, see pg_class.reltoastrelid).
m: Value can be stored compressed inline.
x: Value can be stored compressed inline or stored in “secondary” storage.
Note that m columns can also be moved out to secondary storage, but only as a last resort (e and x columns are moved first).

Test with pg_column_size() instead of length(). Be sure to test actual table columns (with compression applied) not just input values. See:

CREATE TABLE tbl (id int, col text);
INSERT INTO tbl(id, col) VALUES 
   (1, rpad(md5('non_random'::text),     100, md5('non_random'::text)))
 , (2, rpad(md5('non_random'::text),    1000, md5('non_random'::text)))
 , (3, rpad(md5('non_random'::text),   10000, md5('non_random'::text)))
 , (4, rpad(md5('non_random'::text),  100000, md5('non_random'::text)))
 , (5, rpad(md5('non_random'::text),  500000, md5('non_random'::text)))
 , (6, rpad(md5('non_random'::text), 1000000, md5('non_random'::text))); 

SELECT id, left(col, 10) || ' ...' AS col
     , length(col) AS char_length
     , pg_column_size(col) AS compressed
     , pg_column_size(col || '') AS uncompressed
FROM   tbl ORDER BY id;

id | col            | char_length | compressed | uncompressed
---+----------------+-------------+------------+-------------
 1 | 67ad0f29fa ... |         100 |        101 |          104
 2 | 67ad0f29fa ... |        1000 |       1004 |         1004
 3 | 67ad0f29fa ... |       10000 |        160 |        10004
 4 | 67ad0f29fa ... |      100000 |       1191 |       100004
 5 | 67ad0f29fa ... |      500000 |       5765 |       500004
 6 | 67ad0f29fa ... |     1000000 |      11487 |      1000004

SELECT pg_column_size(rpad(md5('non_random'::text), 1000000, md5('non_random'::text)));

pg_column_size
--------------
       1000004

db<>fiddle here

Note how the value is forced to be unpacked from its storage format with the noop expression: pg_column_size(col || '').

The 5th row would be too big to fit the index tuple (even with compression) and trigger the error message in the title.

The 6th row would be to big to fit even the index page and trigger the related error message:

ERROR: index row requires 11504 bytes, maximum size is 8191

The test values generated with rpad() have repeating patterns, which allow for massive compression. Even very long strings still easily fit the max. size after compression this way.

What is the overhead for varchar(n)?

Long answer

I ran more extensive tests, tampering with storage internals to verify my understanding. Only for testing purposes!

db<>fiddle here

dbfiddle does not allow write access to system catalogs. But the queries are there to try "at home".

The problem

Here is a very similar case discussed on pgsql.general. It's about the limitation in a b-tree index, but it's all the same because a GIN index uses a b-tree index for keys internally and therefore runs into the same limitation for key size (instead of item size in a plain b-tree index).

I quote the manual about GIN index implementation:

Internally, a GIN index contains a B-tree index constructed over keys, where each key is an element of one or more indexed items

Either way, at least one array element in your column data is too big to be indexed. If this is just a singular freak value or some kind of accident you may be able to truncate the value and be done with it.

For the purpose of the following demo I'll assume otherwise: lots of long text values in the array.

Simple solution

You could replace elements in your array data with according hash values. And send look-up values through the same hash function. Of course, you probably want to store your originals in addition somewhere. With that, we almost arrive at my second variant ...

Advanced solution

You could create a look-up table for array elements with a serial column as surrogate primary key (effectively a radical kind of hash value) - which is all the more interesting if involved element values are not unique:

CREATE TABLE elem (
  elem_id serial NOT NULL PRIMARY KEY
, elem    text UNIQUE NOT NULL
);

Since we want to look up elem, we add an index - but an index on an expression this time, with only the first 10 characters of the long text. That's should be enough in most cases to narrow a search down to one or a few hits. Adapt the size to your data distribution. Or use a more sophisticated hash function.

CREATE INDEX elem_elem_left10_idx ON elem(left(elem,10));

Your column data would then be of type int[]. I renamed the table to data and got rid of the ominous varchar(50) you had in your example:

CREATE TEMP TABLE data(
  data_id serial PRIMARY KEY
, data int[]
);

Each array element in data refers to a elem.elem_id. At this point, you may consider to replace the array column with an n:m table, thereby normalizing your schema and allowing Postgres to enforce referential integrity. Indexing and general handling becomes easier ...

However, for performance reasons, the int[] column in combination with a GIN index may be superior. Storage size is a lot smaller. In this case we need the GIN index:

CREATE INDEX data_data_gin_idx ON data USING GIN (data);

Now, each key of the GIN index (= array element) is an integer instead of a longish text. The index will be smaller by several orders of magnitude, searches will consequently be much faster.

The downside: before you can actually perform a search you have to look up the elem_id from the table elem. Using my newly introduced functional index elem_elem_left10_idx, this, too, will be much faster.

You can do it all in one simple query:

SELECT d.*, e.*
FROM   elem e
JOIN   data d ON ARRAY[e.elem_id] <@ d.data
WHERE  left(e.elem, 10) = left('word1234word', 10) -- match index condition
AND    e.elem = 'word1234word';  -- need to recheck, functional index is lossy

You may be interested in the extension intarray, that supplies additional operators and operator classes.

Fully functional live demo on sqlfiddle.

Postgresql – Postgres not using the index even when rows returned is 5% of the table

From your query plans, it looks like you're comparing ints to ints in the first query plan, and int to numeric in the second plan.

Your first compare:

Index Cond: (("timestamp" >= 1431100800) AND ("timestamp" <= 1431108000))

and

timestamp >= 1431100800 and timestamp <= 1431108000

In the second query, it's numeric values:

Filter: ((numvalues[1] IS NOT NULL) AND (("timestamp")::numeric >= 1431100800.00) AND (("timestamp")::numeric <= 1431108000.00))

and

timestamp >= 1431093600.00 and timestamp <= 1431100800.00

Casting to numeric causes the index to be ignored in favor of a sequential scan.

You can see this with a very simple example, set up below:

CREATE TABLE t2 (a int);
CREATE INDEX t2_a_idx ON t2(a);
INSERT INTO t2 (a) SELECT i FROM generate_series(1,1000000) AS i;
VACUUM ANALYZE VERBOSE t2;

My first query plan looks like this:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE a > 750000;

Index Only Scan using t2_a_idx on t2 (cost=0.42..7134.65 rows=250413 width=4) 
(actual time=0.019..29.926 rows=250000 loops=1)
Index Cond: (a > 750000)
Heap Fetches: 0
Planning time: 0.137 ms
Execution time: 39.114 ms
(5 rows)
Time: 39.540 ms

While a second query using numerics looks like this:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE a > 750000.00;

Seq Scan on t2  (cost=0.00..19425.00 rows=333333 width=4) (actual time=122.803..175.326 rows=250000 loops=1)
Filter: ((a)::numeric > 750000.00)
Rows Removed by Filter: 750000
Planning time: 0.058 ms
Execution time: 184.194 ms
(5 rows)
Time: 184.487 ms

In the second instance here, the index is ignored in favor of a sequential scan because of the cast to a numeric value, which looks like exactly what's happening in your two examples.

One last aside, you might be able to speed your query up via a SET query before executing it as well:

SET work_mem = 2GB;

If your server can handle it, because your sorts are spilling to disk, as noted in this line from your query plan:

Sort Method: external merge  Disk: 1387704kB

Hope this helps. =)