Postgresql – Distinct on two columns, how to rid from ‘duplicates’

distinctgroup bypostgresql

I have a table with messages, that have 'msg_from' and 'msg_to' fields.

when I want to select all current chats and last message from each of them I use the following query:

select distinct on (msg_from, msg_to) array[msg_from, msg_to] 
as participants from mockdata_messages
order by msg_from, msg_to DESC

example: https://www.db-fiddle.com/f/n1Bttz4i9Cd5RVA9qCpNkH/1

This works fine in Postgres 12

The problem is that I get 'duplicates' in response. I mean, 'participants' values like: [2, 5] and [5, 2], which obviously refers to the same chat.

I got exactly simillar result when I tried to use 'group by' with two columns.
How can I get arround this problem? What approaches are well to use for storing chat messages in DB?

Best Answer

in this simple example you can use the least and greatest functions to make the distinct on condition match in both cases.

select distinct on ( least(msg_from, msg_to),greatest(msg_from, msg_to)) 
array[msg_from, msg_to] as participants from mockdata_messages
order by least(msg_from, msg_to),greatest(msg_from, msg_to)

Related Solutions

Mysql – Distinct Combination of Two Columns

You can count distinct elements by running:

select count(distinct policy_id, client_id) from policy_client;

Another option would be to group by and count that:

select count(*) from (select policy_id, client_id from policy_client group by 1,2) a;

Run both version and see which one performs better on your dataset.

A very quick way but not totally accurate if you have a key on (policy_id and client_id) you can also check the cardinality of that index but that's an approximate not exact number.

Postgresql – Calling SELECT DISTINCT on multiple columns

How does this work exactly?

It gives you distinct combinations of all the expression in the SELECT list.

SELECT DISTINCT col1, col2, ... 
FROM table_name ;

is also equivalent to:

SELECT col1, col2, ... 
FROM table_name 
GROUP BY  col1, col2, ... ;

Another way to look at how it works - probably more accurate - is that it acts as the common bare SELECT (ALL) and then removes any duplicate rows. See Postgres documentation about SELECT: DISTINCT clause.

Should we ever do this?

Of course. If you need it, you can use it.

Best Answer

Related Solutions

Mysql – Distinct Combination of Two Columns

Postgresql – Calling SELECT DISTINCT on multiple columns

Related Question