Postgresql – Find all values not present for other values of other column

postgresqlselect

Let's suppose I have a table with integer columns a and b (each pair (a,b) is unique) and following data:

 a | b
-------
 1 | 2
 1 | 3
 1 | 4
 1 | 5
 2 | 1
 2 | 3
 2 | 4
 3 | 1
 3 | 3

Now I need to get all the rows whose b value is not contained in the set with some other a value (i.e. they appear just once in the union of rows with the 'a' values in question). From these I should get (1,2), (1,5) and (2,1).

I want to include only two sets (groups with the same a value; I want to use this as part of a function, where both a values are given as parameters) at a time, so I don't want to get (3,1) from the new data.

I have some (working) solution; I post it as an answer. However, there should be something more elegant – the query should be one and exploit the symmetricity, not two unioned queries. I thought of various solutions listed in this answer, but after some hour of playing with code I didn't find how to make anything better and working.

Best Answer

If I interpret your question correctly:

"Pick all rows where a has one of two given values, and b only exists in combination with one of them."

You can use any of the standard techniques laid out in the referenced answer.

Select rows where value of second column is not present in first column

Just restrict your base table to the two given a. For instance

SELECT *
FROM   tbl t1
WHERE  a IN (1,2)
AND    NOT EXISTS (
   SELECT 1
   FROM   tbl t2
   WHERE  a IN (1,2)
   AND    t2.b =  t1.b
   AND    t2.a <> t1.a
   );

SQL Fiddle

This also works with duplicates on (a,b). Multiple identical rows would be returned.

Month, date, timestamp?

A month column as varchar(3) doesn't seem very useful if there can be data for more than a single year. I would use data type date for it. You can format that with to_char() any way you like for presentation. Like:

SELECT to_char(date '2017-12-01', 'Mon');  -- 'Dec'

The column could look like this (also addressing your comment):

...
, inserted_at date DEFAULT CURRENT_DATE
...

The default value is entered when the column is omitted in an INSERT statement.

Or, if really only the month is relevant:

... DEFAULT date_trunc('month', now())::date

Or store the complete timestamptz (8 bytes, that's what I would probably do):

...
, inserted_at timestamptz DEFAULT now()
...

Read the manual here and here.

And be aware that date and timestamp depend on your current time zone setting. Details:

Ignoring timezones altogether in Rails and PostgreSQL

Postgresql – Select unique values for one column

create table test(set_name text, value int);
insert into test values
('a', 1),('a', 2),('b', 2),('b', 3),
('b', 4),('c', 3),('c', 5);

You can use a Window function and a CTE (you can move the CTE part to a subquery), to get the desired result.

;with x as
(
    select set_name, value, 
           count(value) over (partition by value) cnt
    from   test
)
select set_name, value
from   x
where  cnt = 1;

set_name | value
:------- | ----:
a        |     1
b        |     4
c        |     5

dbfiddle here

Best Answer

Related Solutions

Postgresql – Find most frequent values for a given column

Month, date, timestamp?

Postgresql – Select unique values for one column

Related Question