Postgresql – Selecting one variant out of different similar results

postgresqlselect

I have a table that looks as follows:

Book    Format    Contents
A       html      ...
A       markdown  ...
B       html      ...
C       markdown  ...

I want to return only 1 version of a book. If a markdown version exists, I want markdown, otherwise I want html.

I'd love to see a SELECT statement that does this, and also a reason why it works.

I think this query is very close to what I need, but I don't understand how it works.

Thanks!

Best Answer

One option is to use window functions to select those books:

select book, format, contents
from (
  select book, format, contents, 
         row_number() over (partition by book order by case when format = 'markdown' then 1 else 2 end) as rn,
         count(*) over (partition by book) as format_count
  from books
) t
where rn = 1
   or format_count = 1;

The window function row_number() in the inner query assigns a number to each format. markdown will get a 1 everything else will get a 2.

The outer select then gets those rows where there is a markdown available or where only a single format is available.

Another (faster) option is to use Postgres' distinct on operator:

select distinct on (book) book, format, contents
from books
order by book, 
         case when format = 'markdown' then 1 else 2 end;

This again sorts on an expression that will give markdown a higher "priority" than html.

The first solution is portable as it is ANSI SQL. The second one is Postgres specific.

SQLFiddle example: http://sqlfiddle.com/#!15/7a027/3