Postgresql – Reducing table size bytes per row in a simple PostgreSQL database

database-designdisk-spaceindexpostgresql

I have a pretty simple 3 column table in postgresql 11. It's timeseries data and the table can contain many billions of rows. I'm concerned about my table size and total size, and want to optimize my design to improve bytes/row.

I've found a couple of really useful questions and answers on the subject already

Measure the size of a PostgreSQL table row

Configuring PostgreSQL for read performance

From running some of the queries shown in those discussions, I believe there's room for improvement, but I don't understand enough to make those improvements 🙂

My create script is as follows:

-- table
CREATE TABLE public.vector_events
(
    vector_stream_id integer NOT NULL,
    event_time timestamp without time zone NOT NULL,
    event_data0 real NOT NULL
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

-- index
CREATE INDEX vector_events_stream_id_event_time_index
ON public.vector_events USING btree
(vector_stream_id, event_time DESC)
TABLESPACE pg_default;

I believe my column widths are optimal – vector_stream_id could be over 100000, event time needs millisecond precision, and our data can be stored within a float.

I chose the index because our queries will only ever be of the form:

SELECT event_time, event_data 
FROM vector_events 
WHERE stream_id=@streamId 
AND event_time >= @lowerBound 
-- (optionally with upper bound) AND event_time <= @upperBound 
ORDER BY event_time DESC -- (sometimes ASC)

It's essential that the above query is performant when the table has at least a million rows (probably hundreds of millions). TBH the choice of binary tree index was a bit of a best guess.

If I use Erwin Brandstetter 's query to check table size:
(query omitted for brevity, but found here: Answer to 'Measure the size of a PostgreSQL table row' )
I get the following (this is from a smaller sample table):

metric                            bytes/ct|bytes_pretty|bytes_per_row
core_relation_size                9076736   8864 kB     52
visibility_map                    8192      8192 bytes  0
free_space_map                    24576     24 kB       0
table_size_incl_toast             9109504   8896 kB     52
indexes_size                      9256960   9040 kB     53
total_size_incl_toast_and_indexes 18366464  18 MB       106
live_rows_in_text_representation  5685353   5552kB      32
------------------------------
row_count                         172800
live_tuples                       172800
dead_tuples                       0

A naive view of the table would say that I have an int (4 bytes), a timestamp without tz (8 bytes) and a float (4 bytes) so 16 bytes of actual data.

I understand it's not quite that simple, but a 52 byte table size seems excessive.

Furthermore, the index size is even larger at 53 bytes (and that is just the index, it doesnt include the event data, right?)

So I have a total size of 105 bytes for each row – surely there must be something I can do to improve this?

I seem to be able to save a few bytes (about 8) by applying this 'column tetris' technique (putting the wider columns first), changing my column order to event_time, stream_id, event_data
Still, how can I get this this down below 97 bytes? What size should I expect for a well designed table and index?

Notes:

Currently I'm using postgresql 11 on windows, I'm in the process of provisioning a linux box for comparison.

My 'real' database is using timescaledb, but I see the same pattern of tablesize/index size in a plain postgresql table, so I believe the cause of excessive table size is in my postgresql schema or index design. (timescale will split up my billions of events into chunk tables each containing several million, but my choice of schema and index is still essential to efficient disk use and performance)
I expect I can also make improvements to server configuration, but firstly I just want to get the best table size.

My 3 considerations right now are (in order of importance)

  1. Read performace, when getting tens of thousands of rows from a table containing many millions. Also aggregate queries.

  2. Disk usage, this becomes prohibitively expensive as the total number of events gets into billions.

  3. Write performance, normally in chronological order for any stream, although some streams may lag behind others, and occasionally we might backfill data.

Best Answer

The best thing to do with questions like this is measure:

CREATE TABLE public.vector_events (
   vector_stream_id integer NOT NULL,
   event_time timestamp without time zone NOT NULL,
   event_data0 real NOT NULL
);

INSERT INTO vector_events
SELECT i,
       current_timestamp + i * INTERVAL '1 second',
       3.1415
FROM generate_series(1, 200000) AS i;

SELECT pg_relation_size('public.vector_events');

 pg_total_relation_size 
------------------------
               10461184
(1 row)

test=> SELECT 10461184 / 200000.0;

      ?column?       
---------------------
 52.3059200000000000
(1 row)

So the 52 bytes per row are pretty much spot on.

About the index:

CREATE INDEX vector_events_stream_id_event_time_index                          
ON public.vector_events (vector_stream_id, event_time DESC);

SELECT pg_total_relation_size('vector_events_stream_id_event_time_index');                      

 pg_total_relation_size 
------------------------
                6324224
(1 row)

test=> SELECT 6324224 / 200000.0;

      ?column?       
---------------------
 31.6211200000000000
(1 row)

That seems pretty normal to me.

You can expect the data to take up more space eventually if you have DELETEs and UPDATEs in your workload, because these cause a certain internal fragmentation (bloat); particularly indexes can become twice or three times as big.

To answer your questions:

  1. Your index is perfect for your query, and it does not matter if you declare it ASC or DESC. So access speed should be optimal.

  2. As you said, you can save 4 bytes per row by having event_time as the first or last row. That's the limit of what is possible.

  3. For good write performance, have fast disks and set max_wal_size high.

You will need a primary key index for the table. The cheapest way would be to use your index for that (if it can be set to UNIQUE), but then you'd have to get rid of the DESC.