PostgreSQL String Compaction – Is There Any String Compaction Option for PostgreSQL?

compressiondisk-spaceencodingpostgresqlstring

I have a table like this:

CREATE TABLE BLOG(
   CATEGORY CHAR(32)
);

Category is a column, which stored enum values.

For example, I have 1 millions of rows and 10 unique values of Category.

Let's populate the table with sample data:

CREATE TABLE TMP_STRINGS(
   id SERIAL PRIMARY KEY,
   CATEGORY CHAR(50)
);

insert into TMP_STRINGS values (1, 'category1'), (2, 'category2'), (3, 'category3'), (4, 'category4'), (5, 'category5'), (6, 'category6'), (7, 'category7'), (8, 'category8'), (9, 'category9');

insert into blog(category)
SELECT category FROM (SELECT floor(random() * 9 + 1)::int as n from generate_series(1,1000000)) AS T INNER JOIN tmp_strings ON T.n = tmp_strings.id;

This database is 42 MB:

SELECT pg_size_pretty(pg_total_relation_size('"blog"'));

A row of database is 44 byte:

SELECT octet_length(t.*::text) FROM blog AS t WHERE category='category1' LIMIT 1;

Categories can be encoded with smallint, like:

category1 -- 1
category2 -- 2
category3 -- 3
category4 -- 4
category5 -- 5
category6 -- 6
category7 -- 7
category8 -- 8
category9 -- 9

Is there any way to do this automatically? I mean some option or module for String interning (String pooling).

Best Answer

if you want an effect like strings stored as integers use an enum type, the SQL syntax for interacting with the table will be the same, except you'll need to cast if you want to do string operations on the new column;

CREATE TYPE example_category AS ENUM (
 'category1', 'category2', 'category3', 'category4', 'category5',
 'category6', 'category7', 'category8', 'category9');

ALTER TABLE tmp_strings ALTER COLUMN category 
  TYPE example_category USING category::example_category;

I think enums use 4 bytes (possibly 5?)