Postgresql – How exactly does the one-byte “char” type work in PostgreSQL

castdatatypespostgresqlstorage

I often see people talking about "char". I've never used it. It's defined in the docs as,

The type "char" (note the quotes) is different from char(1) in that it only uses one byte of storage. It is internally used in the system catalogs as a simplistic enumeration type.

And further,

"char"  1 byte  single-byte internal type

So, if it's one byte, what is the domain and how would you make use of it? Is it signed or unsigned? In this post by @Erwin Brandstetter he lays it out, but I'm still confused. He's using ascii() and chr(), and provides this

SELECT i
     , chr(i)::"char"        AS i_encoded
     , ascii(chr(i)::"char") AS i_decoded
FROM   generate_series(1,256) i;

That's doing something really weird between 10 and 11.

  i  | i_encoded | i_decoded 
-----+-----------+-----------
...
   8 | \x08      |         8
   9 |           |         9
  10 |          +|        10
     |           |           -- WTF is going on here.
  11 | \x0B      |        11
  12 | \x0C      |        12
...

It also gets really weird here:

 126 | ~         |       126
 127 | \x7F      |       127
 128 |           |       128
 129 |           |       128
 130 |           |       128
 131 |           |       128

Why is everything north of 128 being decoded as 128? But to take the bizzare up a bit, after 192 there is a switch and they get decoded as 192..

 190 |           |       128
 191 |           |       128
 192 |           |       192
 193 |           |       192
 194 |           |       192
 195 |           |       192
 196 |           |       192
 197 |           |       192

Erwin says

There are several characters not meant for display. So encode before you store and decode before you display …

I'm not sure why we should encode at all though if we're doing exactly what that questions asks

CREATE TABLE foo AS
SELECT i::"char"
FROM   generate_series(-128,127) i;

That works fine. We can get the ints back out using

SELECT i::int FROM foo;

So in short,

  1. What is Erwin's code doing between 10-11 where the i goes null?
  2. Why is 128 repeated so many times?
  3. Why is 192 repeated so many times?
  4. How do I trigger the inability to store 0, when Erwin says you cannot encode 0 this way (null character not permitted)

    CREATE TABLE foo AS SELECT 0::int::"char" AS x;
    SELECT x::int FROM foo;
     x 
    ---
    0
    

Best Answer

1. chr(10)

... produces the LINEFEED character (a.k.a. escape sequence \n) and psql displays the character with a newline (indicated by +). Everything correct there.

2. & 3. ascii() produces 128 or 192?

It starts with a mistake I made. I carelessly assumed "char" would cover the range of an unsigned 1-byte integer (0 to 255) in the referenced answer (now fixed), but it's actually the range of a signed 1-byte integer (-128 to 127) internally.

ascii() takes a text parameter, the implicit cast from "char" to text produces a multibyte-encoded character in unicode, and the function returns (per documentation on ascii()):

ASCII code of the first character of the argument. For UTF8 returns the Unicode code point of the character. For other multibyte encodings, the argument must be an ASCII character.

So we get a lot of truncated values. 128 and 192 are byte values for the leading byte of multibyte characters.

4. The null byte

The inability to store null bytes only affects regular character types (text, char, varchar), not "char". It applies to my buggy example, because I cast to text as stepping stone. While casting between "char" and integer directly, the limitation does not apply. The manual on chr():

The NULL (0) character is not allowed because text data types cannot store such bytes.

Not so for "char", where 0 is mapped to the empty string '':

SELECT ''::"char"::int  -- 0
     , 0::"char" = '';  -- t

Remember: "char" is still an "internal" type intended for simple and cheap enumeration. Not officially designed for what we are doing here, and not portable to other RDBMS. There are no guarantees by the Postgres project for this.