PostgreSQL JSON – Query Against JSON Array

jsonpostgresqlpython

The data I'm working with is in a column ('transaction') of JSONB type, containing documents similar to this {"transactions":[{"id":"ABC123", ...}, {"id":"DEF456", ...}, {"id": "GHI789", ...}]}

I'm trying to get the distinct transaction id's but since the "id" field is part of a hash, inside an array, I'm struggling to get data out in a useful format.

I've got a solution that returns me something but it feels ungainly, and as a newcomer to Postgres, I feel like I'm not on the right track:

I'm using this from within a Python application, running sqlalchemy, but to get to grips with the data, I've constructed the following in the psql terminal:

WITH orders AS (SELECT transaction->'transactions' AS t FROM transactions) SELECT jsonb_array_elements(t)->>'id' FROM orders;

This returns me a rowset of JSON items extracted from the nested hashes. However, when trying to run this in sqlalchemy I get the following:

>>> query = sqlalchemy.text("WITH orders AS (SELECT transaction->'transactions' AS t FROM transactions) SELECT jsonb_array_elements(t)->>'id' FROM orders")
>>> rowset = engine.execute(query)
>>> rowset.fetchall()
[(u'ABC123',), (u'DEF456',), (u'GHI789',)]

… meaning I need to deconstruct further nested types from the sqlalhemy response. Which overall feels pretty cludged together.

Is there a better way to be querying this nested data?

Best Answer

You can write the query simpler without CTE which probably would perform better.

select (jsonb_array_elements(transaction->'transactions')->'id')::text as trx_id 
from transactions;

SQL alchemy returns a list of tuples. This is the standard behaviour. A row in PostgreSQL is represented as a tuple. Even if you query a single columns it's going to be a tuple with one element. Hence the (u'ABC123',) representation.

If you need a list of the first column you can easily transform your resultset in python:

[r[0] for r in rowset.fetchall()]

This should return

['ABC123', 'DEF456', 'GHI789']