PostgreSQL Data Types – How to Store One-Byte Integer in PostgreSQL 9.6

datatypespostgresqlpostgresql-9.6storage

In PostgreSQL documentation, it is said that integer data types can be stored in either two-, four- or eight-byte space. One of the columns of a table in my database contains a one-byte integer value and I want it to be stored in a one-byte data type.

Is there an extension or a way to use one-byte integer data type in PostgreSQL?
How many bytes is NUMERIC(1,0)?

Best Answer

No, there is no 1-byte integer in the standard distribution of Postgres. All built-in numeric types of standard Postgres occupy 2 or more bytes.

Extension pguint

But yes, there is the extension pguint, maintained by Peter Eisentraut, one of the Postgres core developers. It's not part of the standard distribution:

In addition to various unsigned integer types, it also provides the 1-byte integer you are looking for:

int1 (signed 8-bit integer)
uint1 (unsigned 8-bit integer)
uint2 (unsigned 16-bit integer)
uint4 (unsigned 32-bit integer)
uint8 (unsigned 64-bit integer)

Be sure to read the chapter "Discussion" at the linked site, explaining possible complications. You need to exercise care with type casts and numeric literals when introducing more integer types ...

Workaround

A possible, simple workaround would be to encode 1-byte integer values as "char" (with double-quotes!), an "internal" simplistic 1-character type, which actually uses a single byte of storage, byte values of a signed 1-byte integer, the upper half represented as ASCII characters.

You can encode values in the range of -128 to 127. Demo:

SELECT i
     , i::"char"
     , i::"char"::int
FROM   generate_series(-128,127) i;

There are several characters not meant for display. So encode before you store and decode before you display ...

Remember: "char" is an "internal" type intended for simple and cheap enumeration. Not officially designed for what we are doing here, and not portable to other RDBMS. There are no guarantees by the Postgres project. But since "char" is used all over the system catalogs, the type is not going to change.

_{Initially, I had assumed unsigned 1-byte integer range (0 to 255) and used text as stepping stone. But that only works for numbers 1 - 127. Instead, use signed integer range (-128 to 127) and cast between "char" and integer directly.}

Related Solutions

How to restrict sequence like ID type to C type like int64_t, uint64_t, int32_t

A simple restriction of the domain of the SQL data type by a constraint is the most natural way. You have two options

adding a constraint to the column of your table(s) in question
creating a domain for each data type you want to be tailored to fit your given limitations of the programming language

An example of the first options looks like

ALTER TABLE table_name 
  ADD CONSTRAINT constraint_name 
CHECK (column_name > -128 AND column_name < 127);

(change the table, constraint and column name as needed, and adjust the range of the checked bounds)

The second option is a two step process. First add a new domain to your DB (in this example the domain type is named c_byte8) with the command

CREATE DOMAIN c_byte8 NUMERIC CONSTRAINT c_byte8_constraint
 CHECK (VALUE > -128 AND VALUE < 127);

Second create your table with the appropriate types, i.g. c_byte8 instead of NUMERIC as the attribute's type. E.g.

CREATE TABLE test_table_1 (id NUMERIC, byte_value c_byte8);

Please keep in mind that the first solution will require you to add constraints to each attribute you have defined, or you will define in the future. It makes your DB-create script less readable, because of the repetition of the same constraint condition every time you use the restricted domain. A second downside of this approach is, that you can easily miss some columns of your schema when you add those constraints to an existing schema.

An upside of the first solution is that you can extend an existing schema without much hassle.

As for the second approach, the plus is a more readable DB creation script, as is the same with the table definitions of a running production DB, but changing an existing production DB is more complicated.

PostgreSQL – Indexes: Integer vs String Performance

Short answer: integer is faster than varchar or text in every aspect. Won't matter much for small tables and / or short keys. The difference grows with the length of the keys and the number of rows.

string ... 20 characters long, which in memory is roughly 5x that of the integer (if an integer is 4 bytes, and the strings are pure ASCII at 1 byte per character, then this holds)

To be precise, character types (text or varchar) occupy exactly 21 bytes for 20 ASCII characters on disk and 23 bytes in RAM. Detailed assessment:

What is the overhead for varchar(n)?

Also important: COLLATION rules can make sorting character data more expensive - unlike numeric data types:

Would index lookup be noticeably faster with char vs varchar when all values are 36 chars

Index size is probably responsible for the lion share of performance difference in most cases. Consider the overhead per index tuple (basically the same as for a table): 4 bytes for the item identifier and 8 bytes for the index tuple header. So the index tuple for integer would amount to 20 bytes (including 4 bytes of alignment padding) and for varchar(20) with 20 ASCII characters it would be 36 bytes (also incl. padding). Details:

Configuring PostgreSQL for read performance

All the theory aside: it's best to just test:

Measure the size of a PostgreSQL table row

Postgres 9.5 introduced an optimization for sorting long strings of character data (key word "abbreviated keys"). But a bug in some C library functions on Linux forced the project to disable the feature for non-C collations in Postgres 9.5.2. Details in the release notes.

However, if you actually use Postgres enum types, most of these considerations are irrelevant, since those are implemented with integer values internally anyway. The manual:

An enum value occupies four bytes on disk.

Aside: varchar(255) used to make sense for early versions of SQL Server, which could use a more efficient data type internally up to the limit of 255 characters. But the odd length restriction of 255 characters has no special meaning in Postgres at all.