Postgresql – Add database-level collation to PostgreSQL

collationpostgresql

So I have added a en_US collation to my system:

# locale -a
C
en_US.utf8
POSIX
sk_SK.utf8

Based on Postgres' documentation I have added it to my Postgres instance:

postgres=# select * from pg_collation;
  collname  | collnamespace | collowner | collencoding | collcollate | collctype  
------------+---------------+-----------+--------------+-------------+------------
 default    |            11 |        10 |           -1 |             | 
 C          |            11 |        10 |           -1 | C           | C
 POSIX      |            11 |        10 |           -1 | POSIX       | POSIX
 sk_SK      |            11 |        10 |            6 | sk_SK.utf8  | sk_SK.utf8
 sk_SK.utf8 |            11 |        10 |            6 | sk_SK.utf8  | sk_SK.utf8
 ucs_basic  |            11 |        10 |            6 | C           | C
 en_US      |          2200 |        10 |            6 | en_US.utf8  | en_US.utf8

As you can see the 'collnamespace' of newly added collation is different than the others. My assumption is that this is why I don't have a choice of 'en_US' while creating new database.

Question is: what I need to do to change collnamespace '2200' to '11' or straightly say to create a database with 'en_US' collation? For database creation I'm using pgAdmin tool.

Best Answer

pg_collation is a system table inside pg_catalog system catalog schema. It lists all available collations: https://www.postgresql.org/docs/9.5/static/catalog-pg-collation.html

SELECT * FROM pg_collation;
SELECT * FROM pg_collation WHERE collname='en_US';

pg_collation table belongs to pg_catalog schema. To list all tables in the pg_catalog schema, use

\dS+

One of the columns of pg_collation table is 'collnamespace'. The information of collnamespace can be obtained from pg_namespace table. https://www.postgresql.org/docs/9.5/static/catalog-pg-namespace.html

pg_namespace table contains a hidden column 'oid'. oid is called object identifier used internally to identify postgres objects. Namespace is also known as Schema in postgres. Essentially, the oid number seen in pg_namespace table will be the same oid seen in collnamespace column of pg_collation table for any particular row (Eg: 'en_US).

postgres=# select * from pg_namespace;
      nspname       | nspowner |               nspacl                
--------------------+----------+-------------------------------------
 pg_toast           |       10 | 
 pg_temp_1          |       10 | 
 pg_toast_temp_1    |       10 | 
 pg_catalog         |       10 | {postgres=UC/postgres,=U/postgres}
 public             |       10 | {postgres=UC/postgres,=UC/postgres}
 information_schema |       10 | {postgres=UC/postgres,=U/postgres}
(6 rows)

postgres=# select oid,nspname,nspowner,nspacl from pg_namespace ;
  oid  |      nspname       | nspowner |               nspacl                
-------+--------------------+----------+-------------------------------------
    99 | pg_toast           |       10 | 
 11320 | pg_temp_1          |       10 | 
 11321 | pg_toast_temp_1    |       10 | 
    11 | pg_catalog         |       10 | {postgres=UC/postgres,=U/postgres}
  2200 | public             |       10 | {postgres=UC/postgres,=UC/postgres}
 12453 | information_schema |       10 | {postgres=UC/postgres,=U/postgres}
(6 rows)

postgres=# select * from pg_collation where collname='en_US';
 collname | collnamespace | collowner | collencoding |   collcollate   |    collctype    
----------+---------------+-----------+--------------+-----------------+-----------------
 en_US    |            11 |        10 |            6 | en_US.utf8      | en_US.utf8
 en_US    |            11 |        10 |            8 | en_US           | en_US
 en_US    |            11 |        10 |           16 | en_US.iso885915 | en_US.iso885915
(3 rows)

You see the number 11 in oid column of pg_namespace table matches with collnamespace column of pg_collation. This is because, pg_collation table belongs to pg_catalog namespace or schema.

Now, in your case, pg_collation table belongs to public schema (2200) instead of pg_catalog schema(11). I do not know how this happened to you. But, it is safe to say that Schema is below database level in the postgres architecture hierarchy, so you should not have any issue with creating a new database with required collation.

When you initialize a database cluster using initdb command, the default encoding is set during this time and it is applied to template1 default database. Template1 database is used to create new database by default and it will not allow changes to encoding, because it has been already set by initdb during cluster initialization. To create a database with a new encoding, you would have to use template0 database, which will allow you to specify the different collation and the relevant encoding while creating a new database as follows.

postgres=# CREATE DATABASE mydb ENCODING 'LATIN1' LC_COLLATE 'en_US' LC_CTYPE 'en_US' TEMPLATE template0;
CREATE DATABASE
postgres=# \l+
                                                                    List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |  Size   | Tablespace |                Description                 
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
 mydb      | postgres | LATIN1   | en_US       | en_US       |                       | 6681 kB | pg_default | 
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 7100 kB | pg_default | default administrative connection database
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 6681 kB | pg_default | unmodifiable empty database
           |          |          |             |             | postgres=CTc/postgres |         |            | 
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 6804 kB | pg_default | default template for new databases
           |          |          |             |             | postgres=CTc/postgres |         |            | 
(4 rows)

Notice the size of mydb is same as template0, because template0 is used to create mydb with LATIN1 encoding and 'en_US' collation.

To see the character encoding mapping to internal number encoding (ranging from 0 to 41), use

mydb=# select pg_encoding_to_char(8);
 pg_encoding_to_char 
---------------------
 LATIN1
(1 row)

Hope this helps.