Postgresql – Is it possible to select distinct values in a json document using PostgreSQL

greatest-n-per-groupjsonpostgresql

I have a column that is using the JSON type. I would like to perform a query to select all distinct records for a particular field in the JSON string: I.e. given these three documents

{
  id: 1,
  s: "foo"
},
{
  id:2,
  s: "bar"
},
{
  id:3,
  s: "foo"
},

the query must check the "s" key for distinct values and return the documents with id 1 and 2.

Best Answer

Assuming a JSON array in a Postgres 9.4 jsonb column, this would do the job:

SELECT DISTINCT ON (doc->'s') doc
FROM  (
   SELECT '[
    {
      "id":1,
      "s":"foo"
    },
    {
      "id":2,
      "s":"bar"
    },
    {
      "id":3,
      "s":"foo"
    }]'::jsonb AS j
   ) t
   , jsonb_array_elements(t.j) WITH ORDINALITY t1(doc, rn)
ORDER  BY doc->'s', rn;

Or, unless s is a nested object, it's probably cheaper to fold on the text value instead of the jsonb (sub-)record. Just use the operator ->> instead of -> in this case. The result is the same:

 doc
----------------------
'{"s": "bar", "id": 2}'
'{"s": "foo", "id": 1}'

Replace the subquery t with your actual table.

Key elements are jsonb_array_elements() (or json_array_elements()) in a LATERAL join with WITH ORDINALITY and then the Postgres-specific DISTINCT ON.

Related, with more explanation: