Postgresql – Optimizing table for timeseries Postgres data table

postgresqlpostgresql-performancetimescaledb

I have the below table which maintains a timeseries result. The row only becomes relevant when the signal is true, When signal is false, it just marks that for that particular timestamp we got a result but it is not a valid one, so the res and other columns just contains null values. When signal is null, it marks that we are yet to receive result for this timestamp. The signal is very sparse in nature, it is only true for maybe less than 7% of the records. Also the inserts made to this table are not ordered according to timestamp, older dates could arrive at later time.

CREATE TABLE public.res
(
 pid integer NOT NULL,
 aid integer NOT NULL,
 cid integer NOT NULL,
"time" timestamp without time zone NOT NULL,
 signal boolean,
 price numeric,
 res double precision[] NOT NULL,
 ...<Many more columns of numeric/numeric array data types>
 CONSTRAINT res_pkey PRIMARY KEY (pid, aid, cid, "time")
)

This table can contains millions of records and is growing exponentially as my database is growing. I want to optimize this table. So have following questions

  • Is each row the same size? Or since the row only makes sense if Signal is true, can it be dynamically sized? Hence keeping the overall size of the table low? A min row size would contain (pid,aid,cid,time,signal,price) and the max will additionally contain (res and the remaining columns)? Is it possible to this in Postgres with its dataTypes? I do not want to create separate tables because run time joins could be very expensive when there are millions of records.

  • This SO answer says "In effect NULL storage is absolutely free for tables up to 8 columns." , I have many more columns

  • Any other suggestion that you might have to deal with such problems?

  • I read about timescaleDB, but since the records do not get inserted in order of timestamps does it has any advantage over Postgres in this usecase?

Thanks

Best Answer

If you were seriously considering switching database systems, I could tell you Microsoft SQL Server already has an out of the box feature for sparse columns that fits your use case we'll. But my recommendation would be to not look to change database systems only just to optimize data storage. PostgreSQL is a very capable database system itself.

To answer your question, NULL values only take up 1 bit of data regardless of how many columns there are, and therefore is very lightweight so it reflects your sparseness accurately.