Postgresql – Reducing table size bytes per row in a simple PostgreSQL database

database-designdisk-spaceindexpostgresql

I have a pretty simple 3 column table in postgresql 11. It's timeseries data and the table can contain many billions of rows. I'm concerned about my table size and total size, and want to optimize my design to improve bytes/row.

I've found a couple of really useful questions and answers on the subject already

Measure the size of a PostgreSQL table row

Configuring PostgreSQL for read performance

From running some of the queries shown in those discussions, I believe there's room for improvement, but I don't understand enough to make those improvements 🙂

My create script is as follows:

-- table
CREATE TABLE public.vector_events
(
    vector_stream_id integer NOT NULL,
    event_time timestamp without time zone NOT NULL,
    event_data0 real NOT NULL
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

-- index
CREATE INDEX vector_events_stream_id_event_time_index
ON public.vector_events USING btree
(vector_stream_id, event_time DESC)
TABLESPACE pg_default;

I believe my column widths are optimal – vector_stream_id could be over 100000, event time needs millisecond precision, and our data can be stored within a float.

I chose the index because our queries will only ever be of the form:

SELECT event_time, event_data 
FROM vector_events 
WHERE stream_id=@streamId 
AND event_time >= @lowerBound 
-- (optionally with upper bound) AND event_time <= @upperBound 
ORDER BY event_time DESC -- (sometimes ASC)

It's essential that the above query is performant when the table has at least a million rows (probably hundreds of millions). TBH the choice of binary tree index was a bit of a best guess.

If I use Erwin Brandstetter 's query to check table size:
(query omitted for brevity, but found here: Answer to 'Measure the size of a PostgreSQL table row' )
I get the following (this is from a smaller sample table):

metric                            bytes/ct|bytes_pretty|bytes_per_row
core_relation_size                9076736   8864 kB     52
visibility_map                    8192      8192 bytes  0
free_space_map                    24576     24 kB       0
table_size_incl_toast             9109504   8896 kB     52
indexes_size                      9256960   9040 kB     53
total_size_incl_toast_and_indexes 18366464  18 MB       106
live_rows_in_text_representation  5685353   5552kB      32
------------------------------
row_count                         172800
live_tuples                       172800
dead_tuples                       0

A naive view of the table would say that I have an int (4 bytes), a timestamp without tz (8 bytes) and a float (4 bytes) so 16 bytes of actual data.

I understand it's not quite that simple, but a 52 byte table size seems excessive.

Furthermore, the index size is even larger at 53 bytes (and that is just the index, it doesnt include the event data, right?)

So I have a total size of 105 bytes for each row – surely there must be something I can do to improve this?

I seem to be able to save a few bytes (about 8) by applying this 'column tetris' technique (putting the wider columns first), changing my column order to event_time, stream_id, event_data
Still, how can I get this this down below 97 bytes? What size should I expect for a well designed table and index?

Notes:

Currently I'm using postgresql 11 on windows, I'm in the process of provisioning a linux box for comparison.

My 'real' database is using timescaledb, but I see the same pattern of tablesize/index size in a plain postgresql table, so I believe the cause of excessive table size is in my postgresql schema or index design. (timescale will split up my billions of events into chunk tables each containing several million, but my choice of schema and index is still essential to efficient disk use and performance)
I expect I can also make improvements to server configuration, but firstly I just want to get the best table size.

My 3 considerations right now are (in order of importance)

Read performace, when getting tens of thousands of rows from a table containing many millions. Also aggregate queries.
Disk usage, this becomes prohibitively expensive as the total number of events gets into billions.
Write performance, normally in chronological order for any stream, although some streams may lag behind others, and occasionally we might backfill data.

Best Answer

The best thing to do with questions like this is measure:

CREATE TABLE public.vector_events (
   vector_stream_id integer NOT NULL,
   event_time timestamp without time zone NOT NULL,
   event_data0 real NOT NULL
);

INSERT INTO vector_events
SELECT i,
       current_timestamp + i * INTERVAL '1 second',
       3.1415
FROM generate_series(1, 200000) AS i;

SELECT pg_relation_size('public.vector_events');

 pg_total_relation_size 
------------------------
               10461184
(1 row)

test=> SELECT 10461184 / 200000.0;

      ?column?       
---------------------
 52.3059200000000000
(1 row)

So the 52 bytes per row are pretty much spot on.

About the index:

CREATE INDEX vector_events_stream_id_event_time_index                          
ON public.vector_events (vector_stream_id, event_time DESC);

SELECT pg_total_relation_size('vector_events_stream_id_event_time_index');                      

 pg_total_relation_size 
------------------------
                6324224
(1 row)

test=> SELECT 6324224 / 200000.0;

      ?column?       
---------------------
 31.6211200000000000
(1 row)

That seems pretty normal to me.

You can expect the data to take up more space eventually if you have DELETEs and UPDATEs in your workload, because these cause a certain internal fragmentation (bloat); particularly indexes can become twice or three times as big.

To answer your questions:

Your index is perfect for your query, and it does not matter if you declare it ASC or DESC. So access speed should be optimal.
As you said, you can save 4 bytes per row by having event_time as the first or last row. That's the limit of what is possible.
For good write performance, have fast disks and set max_wal_size high.

You will need a primary key index for the table. The cheapest way would be to use your index for that (if it can be set to UNIQUE), but then you'd have to get rid of the DESC.

Q2: `way to measure page size`

PostgreSQL provides a number of Database Object Size Functions. I packed the most interesting ones in this query and added some Statistics Access Functions at the bottom. (The additional module pgstattuple provides more useful functions, yet.)

This is going to show that different methods to measure the "size of a row" lead to very different results. It all depends on what you want to measure, exactly.

This query requires Postgres 9.3 or later. For older versions see below.

Using a VALUES expression in a LATERAL subquery, to avoid spelling out calculations for every row.

Replace public.tbl with your optionally schema-qualified table name to get a compact view of collected row size statistics. You could wrap this into a plpgsql function for repeated use, hand in the table name as parameter and use EXECUTE ...

SELECT l.metric, l.nr AS bytes
     , CASE WHEN is_size THEN pg_size_pretty(nr) END AS bytes_pretty
     , CASE WHEN is_size THEN nr / NULLIF(x.ct, 0) END AS bytes_per_row
FROM  (
   SELECT min(tableoid)        AS tbl      -- = 'public.tbl'::regclass::oid
        , count(*)             AS ct
        , sum(length(t::text)) AS txt_len  -- length in characters
   FROM   public.tbl t                     -- provide table name *once*
   ) x
CROSS  JOIN LATERAL (
   VALUES
     (true , 'core_relation_size'               , pg_relation_size(tbl))
   , (true , 'visibility_map'                   , pg_relation_size(tbl, 'vm'))
   , (true , 'free_space_map'                   , pg_relation_size(tbl, 'fsm'))
   , (true , 'table_size_incl_toast'            , pg_table_size(tbl))
   , (true , 'indexes_size'                     , pg_indexes_size(tbl))
   , (true , 'total_size_incl_toast_and_indexes', pg_total_relation_size(tbl))
   , (true , 'live_rows_in_text_representation' , txt_len)
   , (false, '------------------------------'   , NULL)
   , (false, 'row_count'                        , ct)
   , (false, 'live_tuples'                      , pg_stat_get_live_tuples(tbl))
   , (false, 'dead_tuples'                      , pg_stat_get_dead_tuples(tbl))
   ) l(is_size, metric, nr);

Result:

              metric               | bytes    | bytes_pretty | bytes_per_row
-----------------------------------+----------+--------------+---------------
 core_relation_size                | 44138496 | 42 MB        |            91
 visibility_map                    |        0 | 0 bytes      |             0
 free_space_map                    |    32768 | 32 kB        |             0
 table_size_incl_toast             | 44179456 | 42 MB        |            91
 indexes_size                      | 33128448 | 32 MB        |            68
 total_size_incl_toast_and_indexes | 77307904 | 74 MB        |           159
 live_rows_in_text_representation  | 29987360 | 29 MB        |            62
 ------------------------------    |          |              |
 row_count                         |   483424 |              |
 live_tuples                       |   483424 |              |
 dead_tuples                       |     2677 |              |

For older versions (Postgres 9.2 or older):

WITH x AS (
   SELECT count(*)               AS ct
        , sum(length(t::text))   AS txt_len  -- length in characters
        , 'public.tbl'::regclass AS tbl      -- provide table name as string
   FROM   public.tbl t                       -- provide table name as name
   ), y AS (
   SELECT ARRAY [pg_relation_size(tbl)
               , pg_relation_size(tbl, 'vm')
               , pg_relation_size(tbl, 'fsm')
               , pg_table_size(tbl)
               , pg_indexes_size(tbl)
               , pg_total_relation_size(tbl)
               , txt_len
             ] AS val
        , ARRAY ['core_relation_size'
               , 'visibility_map'
               , 'free_space_map'
               , 'table_size_incl_toast'
               , 'indexes_size'
               , 'total_size_incl_toast_and_indexes'
               , 'live_rows_in_text_representation'
             ] AS name
   FROM   x
   )
SELECT unnest(name)                AS metric
     , unnest(val)                 AS bytes
     , pg_size_pretty(unnest(val)) AS bytes_pretty
     , unnest(val) / NULLIF(ct, 0) AS bytes_per_row
FROM   x, y

UNION ALL SELECT '------------------------------', NULL, NULL, NULL
UNION ALL SELECT 'row_count', ct, NULL, NULL FROM x
UNION ALL SELECT 'live_tuples', pg_stat_get_live_tuples(tbl), NULL, NULL FROM x
UNION ALL SELECT 'dead_tuples', pg_stat_get_dead_tuples(tbl), NULL, NULL FROM x;

Same result.

Q1: `anything inefficient?`

You could optimize column order to save some bytes per row, currently wasted to alignment padding:

integer                  | not null default nextval('core_page_id_seq'::regclass)
integer                  | not null default 0
character varying(255)   | not null
character varying(64)    | not null
text                     | default '{}'::text
character varying(255)   | 
text                     | default '{}'::text
text                     |
timestamp with time zone |
timestamp with time zone |
integer                  |
integer                  |

This saves between 8 and 18 bytes per row. I call it Column Tetris. See:

Also consider:

Would index lookup be noticeably faster with char vs varchar when all values are 36 chars

MySQL – Indexes deleted, reduced column size but table size increases

Of course, the index size collapsed because you reduced the PRIMARY KEY to a single column and removed KEYid_str(id_str),. This definitely reduced all secondary indexes. That is excellent.

What about the Removal of Columns ?

You removed geo, which were all NULL. There was no shrinkage for the data. Why ? Since all geo columns were NULL, it took no space at all in every row. The mapping of offsets for all fields had geo pointing at nothing at all.

In the MySQL Documentation on FIELD CONTENTS in InnoDB, it has the following example table

CREATE TABLE T
    (FIELD1 VARCHAR(3), FIELD2 VARCHAR(3), FIELD3 VARCHAR(3))
    Type=InnoDB;

Note that both FIELD1 and FIELD3 are VARCHAR fields with NULL values.

From the Helpful Notes About NULL values at the Bottom of that Page

For the third row, I inserted NULLs in FIELD2 and FIELD3. Therefore in the Field Start Offsets the top bit is on for these fields (the values are 94 hexadecimal, 94 hexadecimal, instead of 14 hexadecimal, 14 hexadecimal). And the row is shorter because the NULLs take no space.

Thus, removing a VARCHAR column where every value was NULL will not shrink the table.

What did adding FLOAT columns actually do ?

Let's play a little math game. You said the new table has 106.7M rows.

Each FLOAT column occupies 4 bytes
You added 2 FLOATs, so that is 8 bytes
The amount of space reserved for a FLOAT is still fixed at 4 bytes, even for a NULL FLOAT.
That 853.6M bytes which is 814 MiB.

At first glance, you would think that Data should have increased from 20.8 GiB to 21.6 GiB. What should have been 4% increase, increasing by 814MB, grew to 37.3 GiB, which is a 79.3% increase in Data.

Why Did the Data Grow So Much ?

Please be aware that there is an object in the table called gen_clust_index, a.k.a. the table's Clustered Index. This is the table's PRIMARY KEY. You may find this shocking, but the PRIMARY KEY does not reside in the table's index pages. It resides in the table's Data Pages.

Question: What is the size of a page in InnoDB ? Answer: 16KB.

What is inside the 16KB of a Data Page ?

Data for one or more rows
BTREE information for the PRIMARY KEY
Leaf Nodes that point to a row are in the same page

Remember you added two FLOATs ? This makes each row 8 bytes bigger. Each row will expand in the 16KB of the Data Page, leaving less room for BTREE information. What must InnoDB do to maintain the PRIMARY KEY ? Create additional Data pages to accommodate wider rows along with the associated BTREE information.

Since the PRIMARY KEY is a BTREE, you should expect O(n log n) growth in BTREE nodes, along with an overflow of InnoDB Data pages.

Without doing any further math, I will show you how this is the case.

MyISAM has data and indexes stored separately.

Run the following

CREATE TABLE twitter_tweets_myisam LIKE twitter_tweets_new;
ALTER TABLE twitter_tweets_myisam ENGINE=MyISAM;
ALTER TABLE twitter_tweets_myisam DISABLE KEYS;
INSERT INTO twitter_tweets_myisam SELECT * FROM twitter_tweets_new;
ALTER TABLE twitter_tweets_myisam ENABLE KEYS;

I can assure you that the .MYD file of twitter_tweets_myisam will not be 37.3 GiB

Best Answer

Related Solutions

Postgresql – Measure the size of a PostgreSQL table row

Q2: way to measure page size

Q1: anything inefficient?