Postgresql – Retrieve title from rows with a maximum count in other columns

aggregatecountpostgresqlwindow functions

Given the following table archived:

CREATE TABLE archived (
   sid       serial -- row id
   showurl   text   -- can be: btg, tmwwt, gcaf, emda
   title     text
   reference text   -- title id number
);

Sample Input:

sid | showurl | title       | reference
----+---------+-------------+-----------
1   | btg     | 01-Preface  | 01
2   | btg     | 01-Preface  | 01
3   | emda    | 26-Friction | 26
4   | tmwwt   | 13-Helper   | 13
5   | btg     | 05-Finale   | 05
6   | gcaf    | 07-Help     | 07

SQLfiddle to work with.

Desired Output:

1st selection: showurl: 'btg'

2nd selection: reference: '01'

3rd and final selection: title: '01-Preface'


I am trying to determine from table archived which showurl has the greatest number of rows.

Then, for that 'showurl', I need to identify which reference has the greatest number of rows. Obviously, that 'reference' will be duplicated, but the 'sid' associated with each duplicated 'reference' will be unique.
Once that is determined, I wish to select/display its title.


My initial attempt (using php, postgresql 9.2: queried each showurl, then using max to identify the showurl with the greatest number of hits.):

$query = "SELECT * FROM archived WHERE showurl = 'btg' order by sid ASC";
$query = "SELECT * FROM archived WHERE showurl = 'tmwwt' order by sid ASC";
$query = "SELECT * FROM archived WHERE showurl = 'gcaf' order by sid ASC";
$query = "SELECT * FROM archived WHERE showurl = 'emda' order by sid ASC";

$max = max($numrows1,$numrows2,$numrows3,$numrows4);

Using the sample data above, $max would be 3, but showurl, title, reference, and sid are unavailable for further processing.


I have tried using pg_numrows() but I cannot find a way to refer back either to 'title' or 'reference' for the max rows.

I gather some form of UNION would be in order, but have no idea how to implement this.

Best Answer

Depending on your exact requirements, something along these lines should work:

Subqueries to determine most common attributes

SELECT DISTINCT title
FROM   archived
WHERE  showurl = (
   SELECT showurl
   FROM   archived
   GROUP  BY 1
   ORDER  BY count(*) DESC, 1
   LIMIT  1
   )
AND    reference = (
   SELECT reference
   FROM   archived
   GROUP  BY 1
   ORDER  BY count(*) DESC, 1
   LIMIT  1
   );

I added DISTINCT to the outer SELECT to fold duplicate titles. It's undefined what you need exactly.

Shorter with window functions

A more modern / sophisticated / elegant approach would be with two window functions in the ORDER BY clause, allowing for a very terse query:

SELECT title
FROM   archived
ORDER  BY count(*) OVER (PARTITION BY showurl)   DESC
         ,count(*) OVER (PARTITION BY reference) DESC
LIMIT 1;

Not necessarily faster than the first solution, though. Again, guessing what you might want exactly, I only retrieve a single winning title.

The trick here is to partition the rows by showurl and reference, run a count over each partition and order descending by that count. This way, you can determine a winner in a single query level.

This will not work, if there is no row with the most common showurl and the most common reference at the same time. Again, exact requirements would be needed ...

-> SQLfiddle demo.