Postgresql – Get all rows from table where JSON column contains a certain value

jsonpostgresqlsyntax

I'm struggling to get data from a JSON column in my PostgreSQL database.
In our users table, we have a known_ips column, which is a JSON column, holding a flat array of IP addresses known for the given user, like this:

# select email, known_ips from users limit 3;
       email       |          known_ips                                     
-------------------+-------------------------------
 user1@example.com | ["192.168.1.1","192.168.1.2"]
 user2@example.com | ["192.168.1.3"]
 user3@example.com | ["192.168.1.2"]
(3 rows)

What I'm trying to do is to select all the users that have an IP address from an array of IP addresses (in this example, either 192.168.1.1 or 192.168.1.2) in their known_ips column. So in this case, user1 and user3 have IP address .1 or .2, so I want to have them returned, but not user2, since I'm not looking for the .3 address.

I've tried the following queries, all to no avail:

  • database=# select email, ip from users, json_array_elements(known_ips) as ip where ip in("192.168.1.1","192.168.1.2");
    ERROR:  column "192.168.1.1" does not exist
    LINE 1: ... json_array_elements(known_ips) as ip where ip in("192.168.1...
                                                                 ^
    
  • database=# select email, ip from users, json_array_elements(known_ips) as ip where ip in('192.168.1.1','192.168.1.2') limit 5;
    ERROR:  invalid input syntax for type json
    LINE 1: ... json_array_elements(known_ips) as ip where ip in('192.168.1...
                                                                 ^
    DETAIL:  Token "." is invalid.
    CONTEXT:  JSON data, line 1: 192.168....
    
  • database=# select email, ip from users, json_array_elements(known_ips) as ip where ip in(192.168.1.1,192.168.1.2) limit 5;
    ERROR:  syntax error at or near ".168"
    LINE 1: ...array_elements(known_ips) as ip where ip in(192.168.1.1,192...
                                                                 ^
    
  • database=# select email, ip from users, json_array_elements(known_ips) as ip where ip in("192.168.1.1","192.168.1.2");
    ERROR:  column "192.168.1.1" does not exist
    LINE 1: ... json_array_elements(known_ips) as ip where ip in("192.168.1...
                                                                 ^
    
  • database=# select email, ip from users, json_array_elements(known_ips) as ip where ip in(array["192.168.1.1","192.168.1.2"]);
    ERROR:  column "192.168.1.1" does not exist
    LINE 1: ...array_elements(known_ips) as ip where ip in(array["192.168.1...
                                                                 ^
    
  • database=# select email, ip from users, json_array_elements(known_ips) as ip where ip = any(array["192.168.1.1","192.168.1.2"]);
    ERROR:  column "192.168.1.1" does not exist
    LINE 1: ...ay_elements(known_ips) as ip where ip = any(array["192.168.1...
                                                                 ^
    
  • database=# select email, ip from users, json_array_elements(known_ips) as ip where ip in(unnest(array["192.168.1.1","192.168.1.2"]));
    ERROR:  column "192.168.1.1" does not exist
    LINE 1: ...lements(known_ips) as ip where ip in(unnest(array["192.168.1...
                                                                 ^
    
  • database=# select email, ip from users, json_array_elements(known_ips) as ip where ip = any(array["192.168.1.1","192.168.1.2"]);
    ERROR:  column "192.168.1.1" does not exist
    LINE 1: ...ay_elements(known_ips) as ip where ip = any(array["192.168.1...
                                                                 ^
    

I'm feeling like I'm missing something obvious, but I just don't see it. Why does Postgres think that I'm defining columns instead of strings? How do I properly get the data?

Best Answer

Double quotation marks are name delimiters. They are reserved for delimiting names (column names, table names etc.) that contain non-standard characters or those that you want to explicitly make case-sensitive (because that is their effect in PostgreSQL, which is according to the standard, too).

So, that is why all attempts with "192.168.1.1" fail: PostgreSQL indeed interprets those as names (specifically column names in each of those contexts).

The one case without quotes fails simply because 192.168.1.1 is an invalid token sequence. Numbers and some other constants can be represented in PostgreSQL without quotation marks but the tokens you specify there cannot be interpreted either as a number or anything else.

Finally, the one where you are using single quotation marks around the IPs fails because PostgreSQL is trying to interpret those as JSON literals. Why? Because the ip column is of type json – that is the type of column values returned by json_array_elements.

So, in order for that second attempt of yours to succeed, you should first of all represent the IPs as valid JSON string items. That means you need to enclose them in double quotation marks and then in single quotation marks, like this:

where ip in ('"192.168.1.1"','"192.168.1.2"')

However, that will give you this error:

operator does not exist: json = json

Your options are:

  • convert ip to text:

    where ip::text in ('"192.168.1.1"','"192.168.1.2"')
    
  • convert ip to jsonb

    where ip::jsonb in ('"192.168.1.1"','"192.168.1.2"')
    

Either should get you going.

Note, though, that filtering your data like that can give you duplicates in the output. The issue is, the json_array_elements function turns the specified json value into a row set, repeating the source row's columns for each transposed item. So, for your example the FROM clause effectively produces the following row set:

       email       |          known_ips            |      ip
-------------------+-------------------------------+---------------
 user1@example.com | ["192.168.1.1","192.168.1.2"] | "192.168.1.1"
 user1@example.com | ["192.168.1.1","192.168.1.2"] | "192.168.1.2"
 user2@example.com | ["192.168.1.3"]               | "192.168.1.3"
 user3@example.com | ["192.168.1.2"]               | "192.168.1.2"

Since for user1 each ip will match the IN predicate, you will get the corresponding email returned twice.

To resolve that, instead of this:

...
from users, json_array_elements(known_ips) as ip
where ip::jsonb in ('"192.168.1.1"','"192.168.1.2"')

you can do something like this:

...
from users
where exists
(
  select *
  from json_array_elements(known_ips) as ip
  where ip::jsonb in ('"192.168.1.1"','"192.168.1.2"')
)