The problem
Here is a very similar case discussed on pgsql.general. It's about the limitation in a b-tree index, but it's all the same because a GIN index uses a b-tree index for keys internally and therefore runs into the same limitation for key size (instead of item size in a plain b-tree index).
I quote the manual about GIN index implementation:
Internally, a GIN index contains a B-tree index constructed over keys,
where each key is an element of one or more indexed items
Either way, at least one array element in your column data
is too big to be indexed. If this is just a singular freak value or some kind of accident you may be able to truncate the value and be done with it.
For the purpose of the following demo I'll assume otherwise: lots of long text values in the array.
Simple solution
You could replace elements in your array data
with according hash values. And send look-up values through the same hash function. Of course, you probably want to store your originals in addition somewhere. With that, we almost arrive at my second variant ...
Advanced solution
You could create a look-up table for array elements with a serial
column as surrogate primary key (effectively a radical kind of hash value) - which is all the more interesting if involved element values are not unique:
CREATE TABLE elem (
elem_id serial NOT NULL PRIMARY KEY
, elem text UNIQUE NOT NULL
);
Since we want to look up elem
, we add an index - but an index on an expression this time, with only the first 10 characters of the long text. That's should be enough in most cases to narrow a search down to one or a few hits. Adapt the size to your data distribution. Or use a more sophisticated hash function.
CREATE INDEX elem_elem_left10_idx ON elem(left(elem,10));
Your column data
would then be of type int[]
. I renamed the table to data
and got rid of the ominous varchar(50)
you had in your example:
CREATE TEMP TABLE data(
data_id serial PRIMARY KEY
, data int[]
);
Each array element in data
refers to a elem.elem_id
. At this point, you may consider to replace the array column with an n:m table, thereby normalizing your schema and allowing Postgres to enforce referential integrity. Indexing and general handling becomes easier ...
However, for performance reasons, the int[]
column in combination with a GIN index may be superior. Storage size is a lot smaller. In this case we need the GIN index:
CREATE INDEX data_data_gin_idx ON data USING GIN (data);
Now, each key of the GIN index (= array element) is an integer
instead of a longish text
. The index will be smaller by several orders of magnitude, searches will consequently be much faster.
The downside: before you can actually perform a search you have to look up the elem_id
from the table elem
. Using my newly introduced functional index elem_elem_left10_idx
, this, too, will be much faster.
You can do it all in one simple query:
SELECT d.*, e.*
FROM elem e
JOIN data d ON ARRAY[e.elem_id] <@ d.data
WHERE left(e.elem, 10) = left('word1234word', 10) -- match index condition
AND e.elem = 'word1234word'; -- need to recheck, functional index is lossy
You may be interested in the extension intarray
, that supplies additional operators and operator classes.
From your query plans, it looks like you're comparing ints to ints in the first query plan, and int to numeric in the second plan.
Your first compare:
Index Cond: (("timestamp" >= 1431100800) AND ("timestamp" <= 1431108000))
and
timestamp >= 1431100800 and timestamp <= 1431108000
In the second query, it's numeric values:
Filter: ((numvalues[1] IS NOT NULL) AND (("timestamp")::numeric >= 1431100800.00) AND (("timestamp")::numeric <= 1431108000.00))
and
timestamp >= 1431093600.00 and timestamp <= 1431100800.00
Casting to numeric causes the index to be ignored in favor of a sequential scan.
You can see this with a very simple example, set up below:
CREATE TABLE t2 (a int);
CREATE INDEX t2_a_idx ON t2(a);
INSERT INTO t2 (a) SELECT i FROM generate_series(1,1000000) AS i;
VACUUM ANALYZE VERBOSE t2;
My first query plan looks like this:
EXPLAIN ANALYZE SELECT * FROM t2 WHERE a > 750000;
Index Only Scan using t2_a_idx on t2 (cost=0.42..7134.65 rows=250413 width=4)
(actual time=0.019..29.926 rows=250000 loops=1)
Index Cond: (a > 750000)
Heap Fetches: 0
Planning time: 0.137 ms
Execution time: 39.114 ms
(5 rows)
Time: 39.540 ms
While a second query using numerics looks like this:
EXPLAIN ANALYZE SELECT * FROM t2 WHERE a > 750000.00;
Seq Scan on t2 (cost=0.00..19425.00 rows=333333 width=4) (actual time=122.803..175.326 rows=250000 loops=1)
Filter: ((a)::numeric > 750000.00)
Rows Removed by Filter: 750000
Planning time: 0.058 ms
Execution time: 184.194 ms
(5 rows)
Time: 184.487 ms
In the second instance here, the index is ignored in favor of a sequential scan because of the cast to a numeric value, which looks like exactly what's happening in your two examples.
One last aside, you might be able to speed your query up via a SET query before executing it as well:
SET work_mem = 2GB;
If your server can handle it, because your sorts are spilling to disk, as noted in this line from your query plan:
Sort Method: external merge Disk: 1387704kB
Hope this helps. =)
Best Answer
Short answer: compression.
The data type
text
allows (lossless!) compression and storage out of line by default:The manual about
pg_type.typstorage
:Test with
pg_column_size()
instead oflength()
. Be sure to test actual table columns (with compression applied) not just input values. See:db<>fiddle here
Note how the value is forced to be unpacked from its storage format with the noop expression:
pg_column_size(col || '')
.The 5th row would be too big to fit the index tuple (even with compression) and trigger the error message in the title.
The 6th row would be to big to fit even the index page and trigger the related error message:
The test values generated with
rpad()
have repeating patterns, which allow for massive compression. Even very long strings still easily fit the max. size after compression this way.Related:
Long answer
I ran more extensive tests, tampering with storage internals to verify my understanding. Only for testing purposes!
dbfiddle does not allow write access to system catalogs. But the queries are there to try "at home".