I'm trying to fetch addresses for certain users only ...
IN
does not scale well with big lists. Compare performance to unnest()
/ JOIN
SELECT a.*
FROM unnest(ARRAY[9681,23824,23760,20098,962,14730,12294,9552,534,
553,5837,6768,6583,956,24179]) AS t(user_id)
JOIN addresses a USING (user_id)
WHERE a.type = 'VerifiedAddress'
AND a.state <> 'manual_verification'
I hope state
and type
are at least enum
types. Otherwise you should normalize your database design. Have lookup-tables for state
and type
and only use a small IDs referencing entries there. Makes the addresses
table smaller and everything faster.
Assuming all columns NOT NULL
for lack of information.
If you need it faster still, and if there are more than a few rows with type <> 'VerifiedAddress'
or state = 'manual_verification'
, a partial index would help some more. The gain grows with the percentage of rows you can exclude from the index.
CREATE INDEX addresses_selection_idx ON addresses (user_id)
WHERE type = 'VerifiedAddress'
AND state <> 'manual_verification';
Be sure to use matching WHERE
conditions in your query to allow Postgres to use this index. And weigh the benefit of tailored indexes against their cost (maintenance, extra disk space, small penalty for write operations).
Double quotation marks are name delimiters. They are reserved for delimiting names (column names, table names etc.) that contain non-standard characters or those that you want to explicitly make case-sensitive (because that is their effect in PostgreSQL, which is according to the standard, too).
So, that is why all attempts with "192.168.1.1"
fail: PostgreSQL indeed interprets those as names (specifically column names in each of those contexts).
The one case without quotes fails simply because 192.168.1.1
is an invalid token sequence. Numbers and some other constants can be represented in PostgreSQL without quotation marks but the tokens you specify there cannot be interpreted either as a number or anything else.
Finally, the one where you are using single quotation marks around the IPs fails because PostgreSQL is trying to interpret those as JSON literals. Why? Because the ip
column is of type json
– that is the type of column values returned by json_array_elements
.
So, in order for that second attempt of yours to succeed, you should first of all represent the IPs as valid JSON string items. That means you need to enclose them in double quotation marks and then in single quotation marks, like this:
where ip in ('"192.168.1.1"','"192.168.1.2"')
However, that will give you this error:
operator does not exist: json = json
Your options are:
convert ip
to text
:
where ip::text in ('"192.168.1.1"','"192.168.1.2"')
convert ip
to jsonb
where ip::jsonb in ('"192.168.1.1"','"192.168.1.2"')
Either should get you going.
Note, though, that filtering your data like that can give you duplicates in the output. The issue is, the json_array_elements
function turns the specified json
value into a row set, repeating the source row's columns for each transposed item. So, for your example the FROM clause effectively produces the following row set:
email | known_ips | ip
-------------------+-------------------------------+---------------
user1@example.com | ["192.168.1.1","192.168.1.2"] | "192.168.1.1"
user1@example.com | ["192.168.1.1","192.168.1.2"] | "192.168.1.2"
user2@example.com | ["192.168.1.3"] | "192.168.1.3"
user3@example.com | ["192.168.1.2"] | "192.168.1.2"
Since for user1 each ip
will match the IN predicate, you will get the corresponding email returned twice.
To resolve that, instead of this:
...
from users, json_array_elements(known_ips) as ip
where ip::jsonb in ('"192.168.1.1"','"192.168.1.2"')
you can do something like this:
...
from users
where exists
(
select *
from json_array_elements(known_ips) as ip
where ip::jsonb in ('"192.168.1.1"','"192.168.1.2"')
)
Best Answer
This is called a left anti-join. You want all users who don't have a verified address. So you do a left join to find all users and whether they do have a verified address, then discard all the ones that have a verified address.
Something like:
This may be easier to understand when expressed with
NOT EXISTS
as:both of which will optimize to the same query plan, in general.