Postgresql – Does index occupy disk memory for rows for which index expression is NULL

postgresqlpostgresql-9.4

Let's assume that I have following pg_trgm index on mydata JSONB column:

CREATE TABLE mytable (mydata JSONB);
CREATE INDEX trgm_index_1 ON mytable
USING gin ((mydata#>>'{myfield}') gin_trgm_ops);

And let's assume that I will insert following data into mytable:

500k rows with mydata = { "myfield": "some_text" }
500k rows with mydata = { "another_field": "some_text" }

So 1M rows in total, but half of them contains value that is used in index expression (mydata#>>'{myfield}'). Does it mean that index will use ~50% less memory on disk than it would use if all of rows contained myfield json field?

Best Answer

Let's simply try it!

I've created the table from your question, then did the following:

INSERT INTO mytable 
SELECT json_object(ARRAY['myfield', i::text || 'bla'])::jsonb 
FROM generate_series (1, 100000) t(i);

\di+ trgm_index_1

 Schema │     Name     │ Type  │ Owner  │  Table  │ Size  │ Description 
────────┼──────────────┼───────┼────────┼─────────┼───────┼─────────────
 test   │ trgm_index_1 │ index │ avaczi │ mytable │ 19 MB │ 

So far, we have only rows where myfield is present. Let's now add some other rows, too:

INSERT INTO mytable 
SELECT json_object(ARRAY['other_field', i::text || 'bla'])::jsonb 
FROM generate_series (100001, 200000) t(i);

After this, the reported index size has changed to 22 MB, meaning that the new rows are also accounted for. If you want to exclude them, you have to create a partial index:

CREATE INDEX trgm_index_partial ON mytable
USING gin ((mydata#>>'{myfield}') gin_trgm_ops)
WHERE mydata ? 'myfield';

The size of this one is initially the same (19 MB) as the non-partial index, but after adding the second group of rows, it does not change:

\di+ trgm_index_*

 Schema │        Name        │ Type  │ Owner  │  Table  │ Size  │ Description 
────────┼────────────────────┼───────┼────────┼─────────┼───────┼─────────────
 test   │ trgm_index_1       │ index │ avaczi │ mytable │ 22 MB │ 
 test   │ trgm_index_partial │ index │ avaczi │ mytable │ 19 MB │