Postgresql – How exactly does the one-byte “char” type work in PostgreSQL

castdatatypespostgresqlstorage

I often see people talking about "char". I've never used it. It's defined in the docs as,

The type "char" (note the quotes) is different from char(1) in that it only uses one byte of storage. It is internally used in the system catalogs as a simplistic enumeration type.

And further,

"char"  1 byte  single-byte internal type

So, if it's one byte, what is the domain and how would you make use of it? Is it signed or unsigned? In this post by @Erwin Brandstetter he lays it out, but I'm still confused. He's using ascii() and chr(), and provides this

SELECT i
     , chr(i)::"char"        AS i_encoded
     , ascii(chr(i)::"char") AS i_decoded
FROM   generate_series(1,256) i;

That's doing something really weird between 10 and 11.

  i  | i_encoded | i_decoded 
-----+-----------+-----------
...
   8 | \x08      |         8
   9 |           |         9
  10 |          +|        10
     |           |           -- WTF is going on here.
  11 | \x0B      |        11
  12 | \x0C      |        12
...

It also gets really weird here:

 126 | ~         |       126
 127 | \x7F      |       127
 128 |           |       128
 129 |           |       128
 130 |           |       128
 131 |           |       128

Why is everything north of 128 being decoded as 128? But to take the bizzare up a bit, after 192 there is a switch and they get decoded as 192..

 190 |           |       128
 191 |           |       128
 192 |           |       192
 193 |           |       192
 194 |           |       192
 195 |           |       192
 196 |           |       192
 197 |           |       192

Erwin says

There are several characters not meant for display. So encode before you store and decode before you display …

I'm not sure why we should encode at all though if we're doing exactly what that questions asks

CREATE TABLE foo AS
SELECT i::"char"
FROM   generate_series(-128,127) i;

That works fine. We can get the ints back out using

SELECT i::int FROM foo;

So in short,

What is Erwin's code doing between 10-11 where the i goes null?
Why is 128 repeated so many times?
Why is 192 repeated so many times?
How do I trigger the inability to store 0, when Erwin says you cannot encode 0 this way (null character not permitted)
```
CREATE TABLE foo AS SELECT 0::int::"char" AS x;
SELECT x::int FROM foo;
 x 
---
0
```

Best Answer

1. `chr(10)`

... produces the LINEFEED character (a.k.a. escape sequence \n) and psql displays the character with a newline (indicated by +). Everything correct there.

2. & 3. `ascii()` produces 128 or 192?

It starts with a mistake I made. I carelessly assumed "char" would cover the range of an unsigned 1-byte integer (0 to 255) in the referenced answer (now fixed), but it's actually the range of a signed 1-byte integer (-128 to 127) internally.

ascii() takes a text parameter, the implicit cast from "char" to text produces a multibyte-encoded character in unicode, and the function returns (per documentation on ascii()):

ASCII code of the first character of the argument. For UTF8 returns the Unicode code point of the character. For other multibyte encodings, the argument must be an ASCII character.

So we get a lot of truncated values. 128 and 192 are byte values for the leading byte of multibyte characters.

4. The null byte

The inability to store null bytes only affects regular character types (text, char, varchar), not "char". It applies to my buggy example, because I cast to text as stepping stone. While casting between "char" and integer directly, the limitation does not apply. The manual on chr():

The NULL (0) character is not allowed because text data types cannot store such bytes.

Not so for "char", where 0 is mapped to the empty string '':

SELECT ''::"char"::int  -- 0
     , 0::"char" = '';  -- t

Remember: "char" is still an "internal" type intended for simple and cheap enumeration. Not officially designed for what we are doing here, and not portable to other RDBMS. There are no guarantees by the Postgres project for this.

Extension pguint

But yes, there is the extension pguint, maintained by Peter Eisentraut, one of the Postgres core developers. It's not part of the standard distribution:

In addition to various unsigned integer types, it also provides the 1-byte integer you are looking for:

int1 (signed 8-bit integer)
uint1 (unsigned 8-bit integer)
uint2 (unsigned 16-bit integer)
uint4 (unsigned 32-bit integer)
uint8 (unsigned 64-bit integer)

Be sure to read the chapter "Discussion" at the linked site, explaining possible complications. You need to exercise care with type casts and numeric literals when introducing more integer types ...

Workaround

A possible, simple workaround would be to encode 1-byte integer values as "char" (with double-quotes!), an "internal" simplistic 1-character type, which actually uses a single byte of storage, byte values of a signed 1-byte integer, the upper half represented as ASCII characters.

You can encode values in the range of -128 to 127. Demo:

SELECT i
     , i::"char"
     , i::"char"::int
FROM   generate_series(-128,127) i;

There are several characters not meant for display. So encode before you store and decode before you display ...

Remember: "char" is an "internal" type intended for simple and cheap enumeration. Not officially designed for what we are doing here, and not portable to other RDBMS. There are no guarantees by the Postgres project. But since "char" is used all over the system catalogs, the type is not going to change.

_{Initially, I had assumed unsigned 1-byte integer range (0 to 255) and used text as stepping stone. But that only works for numbers 1 - 127. Instead, use signed integer range (-128 to 127) and cast between "char" and integer directly.}

Postgresql – How does cast work with Set Returning Functions (SRF) like generate_series

Cast the actual value, not the set.

SELECT v::date
FROM generate_series('2018-01-01'::date, '2019-01-01'::date, '1 day') as t(v);

Best Answer

1. chr(10)

2. & 3. ascii() produces 128 or 192?

4. The null byte

Related Solutions

Postgresql – How to store one-byte integer in PostgreSQL

Extension pguint

Workaround

Postgresql – How does cast work with Set Returning Functions (SRF) like generate_series

Related Question

1. `chr(10)`

2. & 3. `ascii()` produces 128 or 192?