Postgresql – Customize jsonb key sort order involving arrays

arrayindexjsonorder-bypostgresql

I have a table in PostgreSQL with some data:

create table t2 (
    key jsonb,
    value jsonb
);

INSERT INTO t2(key, value)
 VALUES
 ('1', '"test 1"')
,('2', '"test 2"')
,('3', '"test 3"')
,('[]', '"test 4"')
,('[1]', '"test 5"')
,('[2]', '"test 6"')
,('[3]', '"test 7"')
,('[1, 2]', '"test 8"')
,('[1, 2, 3]', '"test 9"')
,('[1, 3]', '"test 10"')
,('[1,2,4]', '"test 11"')
,('[1, 2,4]', '"test 12"')
,('[1,3,13]', '"test 13"')
,('[1, 2, 15]', '"test 15"');

And I try to sort these rows like that:

SELECT key FROM t2 order by key;

The result is:

[]
1
2
3
[1]
[2] <==
[3] <==
[1, 2]
[1, 3] <==
[1, 2, 3]
[1, 2, 4]
[1, 2, 4]
[1, 2, 15]
[1, 3, 13]

But what I need is:

[]
1
2
3
[1]
[1, 2]
[1, 2, 3]
[1, 2, 4]
[1, 2, 4]
[1, 2, 15]
[1, 3] <==
[1, 3, 13]
[2] <==
[3] <==

Is there a way to achieve it?

Best Answer

First off, your question as well as your column name "key" are misleading. The column key does not contain any JSON keys, only values. Else we could use the function jsonb_object_keys(jsonb) to extract keys, but that's not so.

Assuming all your JSON arrays are either empty or hold integer numbers as demonstrated. And the scalar values (non-arrays) are also integer.

Your basic sort order would work with Postgres integer (or numeric) arrays. I use this little helper function to convert jsonb arrays to Postgres int[]:

CREATE OR REPLACE FUNCTION jsonb_arr2int_arr(_js jsonb)
   RETURNS int[] LANGUAGE sql IMMUTABLE AS
'SELECT ARRAY(SELECT j::int FROM jsonb_array_elements_text(_js) j)';

Explanation:

Then add jsonb_typeof(jsonb) to arrive at:

SELECT key
FROM   t2
ORDER  BY key <> '[]'             -- special case for empty array
        , jsonb_typeof(key) DESC  -- 'number' before 'array'
        , CASE jsonb_typeof(key)  -- sort arrays as converted int[]
            WHEN 'array'  THEN jsonb_arr2int_arr(key)
            WHEN 'number' THEN ARRAY[key::text::int]
          END;

Produces the desired result exactly.

Why?

The manual for jsonb explains:

The btree ordering for jsonb datums is seldom of great interest, but for completeness it is:

Object > Array > Boolean > Number > String > Null
Object with n pairs > object with n - 1 pairs
Array with n elements > array with n - 1 elements

Objects with equal numbers of pairs are compared in the order:

key-1, value-1, key-2 ...

Note that object keys are compared in their storage order; in particular, since shorter keys are stored before longer keys, this can lead to results that might be unintuitive, such as:

{ "aa": 1, "c": 1} > {"b": 1, "d": 1}

Similarly, arrays with equal numbers of elements are compared in the order:

element-1, element-2 ...

Bold emphasis mine.
That's why jsonb '[2]' < jsonb '[1, 2]'.
But Postgres arrays just sort element-by-element: '{2}'::int[] > '{1, 2}' - exactly what you were looking for.