Postgresql – Selecting distinct records

postgresql

I have a user relation to address (1 to 1).

So the address has two types which is a type column.

One is the VerifiedAddress and the other one is the SearchAddress.

So the user who had verified his address has two entries in the addresses table.

In case you wonder why is that, the reason is that user can go to a travel and change the SearchAddress but the VerifiedAddress remains the same.

I'm trying to find all the users who haven't got verified their address. I have some user list from which I know who has verified address and who hasn't got it.

This is what I have tried the last, not exactly what I wanted :

select id from addresses
where user_id in (8112, 18620, 23, 19668, 10212)
and type = 'VerifiedAddress';

Gives me all the verified users, but if I change the type != VerifiedAddress I still get the same results because users who have verified their address have two entries in a addresses table, this time I get the user_ids from the SearchAddress.

How can I get this data from addresses table?

Note: SearchAddress can be null as well as VerifiedAddress. Naturally people who don't have searchaddress don't have the verified as well.

Best Answer

This is called a left anti-join. You want all users who don't have a verified address. So you do a left join to find all users and whether they do have a verified address, then discard all the ones that have a verified address.

Something like:

SELECT *
FROM users
LEFT OUTER JOIN addresses a
ON (users.id = a.user_id AND a.type = 'VerifiedAddress')
WHERE a.user_id IS NULL;

This may be easier to understand when expressed with NOT EXISTS as:

SELECT *
FROM users
WHERE NOT EXISTS (
  SELECT 1 
  FROM addresses a
  WHERE users.id = a.user_id
  AND a.type = 'VerifiedAddress'
)

both of which will optimize to the same query plan, in general.

Partial index

If you need it faster still, and if there are more than a few rows with type <> 'VerifiedAddress' or state = 'manual_verification', a partial index would help some more. The gain grows with the percentage of rows you can exclude from the index.

CREATE INDEX addresses_selection_idx ON addresses (user_id)
WHERE  type  =  'VerifiedAddress'
AND    state <> 'manual_verification';

Be sure to use matching WHERE conditions in your query to allow Postgres to use this index. And weigh the benefit of tailored indexes against their cost (maintenance, extra disk space, small penalty for write operations).

Postgresql – Get all rows from table where JSON column contains a certain value

Double quotation marks are name delimiters. They are reserved for delimiting names (column names, table names etc.) that contain non-standard characters or those that you want to explicitly make case-sensitive (because that is their effect in PostgreSQL, which is according to the standard, too).

So, that is why all attempts with "192.168.1.1" fail: PostgreSQL indeed interprets those as names (specifically column names in each of those contexts).

The one case without quotes fails simply because 192.168.1.1 is an invalid token sequence. Numbers and some other constants can be represented in PostgreSQL without quotation marks but the tokens you specify there cannot be interpreted either as a number or anything else.

Finally, the one where you are using single quotation marks around the IPs fails because PostgreSQL is trying to interpret those as JSON literals. Why? Because the ip column is of type json – that is the type of column values returned by json_array_elements.

So, in order for that second attempt of yours to succeed, you should first of all represent the IPs as valid JSON string items. That means you need to enclose them in double quotation marks and then in single quotation marks, like this:

where ip in ('"192.168.1.1"','"192.168.1.2"')

However, that will give you this error:

operator does not exist: json = json

Your options are:

convert ip to text:

where ip::text in ('"192.168.1.1"','"192.168.1.2"')

convert ip to jsonb

where ip::jsonb in ('"192.168.1.1"','"192.168.1.2"')

Either should get you going.

Note, though, that filtering your data like that can give you duplicates in the output. The issue is, the json_array_elements function turns the specified json value into a row set, repeating the source row's columns for each transposed item. So, for your example the FROM clause effectively produces the following row set:

       email       |          known_ips            |      ip
-------------------+-------------------------------+---------------
 user1@example.com | ["192.168.1.1","192.168.1.2"] | "192.168.1.1"
 user1@example.com | ["192.168.1.1","192.168.1.2"] | "192.168.1.2"
 user2@example.com | ["192.168.1.3"]               | "192.168.1.3"
 user3@example.com | ["192.168.1.2"]               | "192.168.1.2"

Since for user1 each ip will match the IN predicate, you will get the corresponding email returned twice.

To resolve that, instead of this:

...
from users, json_array_elements(known_ips) as ip
where ip::jsonb in ('"192.168.1.1"','"192.168.1.2"')

you can do something like this:

...
from users
where exists
(
  select *
  from json_array_elements(known_ips) as ip
  where ip::jsonb in ('"192.168.1.1"','"192.168.1.2"')
)

Best Answer

Related Solutions

Postgresql – Optimizing Postgres query

Partial index

Postgresql – Get all rows from table where JSON column contains a certain value

Related Question