Postgresql – Distinct on two columns, how to rid from ‘duplicates’

distinctgroup bypostgresql

I have a table with messages, that have 'msg_from' and 'msg_to' fields.

when I want to select all current chats and last message from each of them I use the following query:

select distinct on (msg_from, msg_to) array[msg_from, msg_to] 
as participants from mockdata_messages
order by msg_from, msg_to DESC

example: https://www.db-fiddle.com/f/n1Bttz4i9Cd5RVA9qCpNkH/1

This works fine in Postgres 12

The problem is that I get 'duplicates' in response. I mean, 'participants' values like: [2, 5] and [5, 2], which obviously refers to the same chat.

I got exactly simillar result when I tried to use 'group by' with two columns.
How can I get arround this problem? What approaches are well to use for storing chat messages in DB?

Best Answer

in this simple example you can use the least and greatest functions to make the distinct on condition match in both cases.

select distinct on ( least(msg_from, msg_to),greatest(msg_from, msg_to)) 
array[msg_from, msg_to] as participants from mockdata_messages
order by least(msg_from, msg_to),greatest(msg_from, msg_to)