Postgresql – Extract JSONB column into a separate table

postgresqlpostgresql-performance

I have a table with roughly the following columns:

CREATE TABLE records (
    id integer NOT NULL,
    ...
    metadata jsonb DEFAULT '{}'::jsonb
)

... includes the rest of the columns which have either integer, varying or timestamp columns. metadata column holds information about the current state of the system, which means that the properties might change over time and are not the same for all the records.

Currently, records table receives a lot of writes and reads, but very few updates. Current data size is over 70GB (+ 30GB index size) with over 160 million rows.

Recently, the query planner on the records table queries and joins (also few queries with anti-joins) is showing bad estimates and it's generally very slow. I've found that more than 52% of the data is stored under metadata column:

SELECT sum(pg_column_size(metadata)) AS total_size,
       avg(pg_column_size(metadata)) AS average_size,
       sum(pg_column_size(metadata)) * 100.0 / pg_relation_size('records') AS percentage
FROM records;
 total_size  |     average_size     |     percentage      
-------------+----------------------+---------------------
 40108195852 | 218.9854922110055888 | 52.7480535656110357
(1 row)

This column is almost never queried (except rare troubleshooting cases) and only needs to be read few times a week.

Does it make sense to extract this into a separate table and have as many new columns needed there?

CREATE TABLE record_metadata (
  record_id integer,
  column1 ...,
  column2 ...

Will this change ultimately improve the performance of regular read queries on the records table?

Best Answer

Yes it would help performance because you table size will be reduced a lot.

Hi Ali

Here are the possible benefits:

  1. Auto Vacuum/analyze will be faster
  2. Your full table scans will be faster

Your indexes on other columns will stay the same size.

Downside is you have to maintain two tables and total size of table will be a little bit bigger (due to id column in both tables).