Postgresql – Query Amount of entries per column from multiple tables + provide table name

postgresql

I have five tables with all having the same column (objart):

I queried for the number of entries and combined the result with UNION, which worked well.

Two questions:

1) Is there a shorter/ better way to do it? Other possibilities than UNION?

2) The selection provides me the "objart"-entry and how many rows of this entry are in the table(s). How can I add the table name from which table the specific "objart" entry is derived? (Up to now I don't know whether a "objart" entry occurs in just one OR in more tables)

SELECT objart, COUNT(*) AS "Number of objects"
FROM "SchemaA".table_1
GROUP BY objart
UNION
SELECT objart, COUNT(*)
FROM "SchemaA".table_2
GROUP BY objart
UNION
SELECT objart, COUNT(*)
FROM "SchemaA".tanle_3
GROUP BY objart
UNION
SELECT objart, COUNT(*)
FROM "SchemaA".table_4
GROUP BY objart
UNION
SELECT objart, COUNT(*)
FROM "SchemaA".table_5
GROUP BY objart
ORDER BY objart;

Best Answer

A UNION (or perhaps a UNION ALL, for reasons that I will note below) is a reasonable way to do this. There are a couple of ways to write the UNION. You did the first, which is to aggregate at each individual table level and then UNION the aggregated results. Is there a possibility that two different tables will have the same object in them? That is, something like this:

"Schema".table_1
OBJART
'Renoir'
'Picasso'

"Schema".table_2
OBJART
'Rodin'
'Picasso'

If this is the case, your query above would actually discard one of the sub-results because of how UNION scans the resultsets it is UNIONing for duplicates and discards them . In other words, this subquery

SELECT objart, COUNT(*) AS "Number of objects" FROM "SchemaA".table_1

Will produce this resultset given the data above:

objart   Number of objects
'Renoir'   1
'Picasso'  1

And this query:

SELECT objart, COUNT(*) AS "Number of objects" FROM "SchemaA".table_2

Will produce this resultset:

objart   Number of objects
'Rodin'   1
'Picasso'  1

However, because UNION scans input resultsets and discards duplicates, you will lose one of the 'Picasso' rows and the overarching UNION query will result in this:

objart   Number of objects
'Renoir'   1
'Picasso'  1
'Rodin'    1

I think what you probably want is this:

objart   Number of objects
'Renoir'   1
'Picasso'  2
'Rodin'    1

Also, a sidenote -- can objart ever be NULL? If so, and if you don't want to count NULL rows, I wouldn't use COUNT(*) because that counts all of the rows in the table. COUNT(objart) would work better as it will only count non-NULL values. Yes, it might work either way but to be safe I always like to be very explicit that I am telling the DB to count the number of non-NULL values for a particular column and not the number of rows in a table overall. e.g.:

SELECT objart, COUNT(objart) AS "Number of objects" FROM "SchemaA".table_1

So, there is another way that might be a bit easier. UNION ALL all of the data in the underlying tables and then compute the aggregate on top of them. Here is the query for that:

WITH subquery_art AS
(SELECT objart FROM "SchemaA".table_1
UNION ALL
SELECT objart FROM "SchemaA".table_2
UNION ALL
SELECT objart FROM "SchemaA".table_3
UNION ALL
SELECT objart FROM "SchemaA".table_4
UNION ALL
SELECT objart FROM "SchemaA".table_5)
SELECT objart, COUNT(objart) AS "Number_of_objects"
FROM subquery_art
GROUP BY objart
ORDER BY objart;

If you want to add the table name that you're sourcing data from, here's how to do it:

WITH subquery_art AS
(SELECT 'table1' as table_name, objart FROM "SchemaA".table_1
UNION ALL
SELECT 'table2', objart FROM "SchemaA".table_2
UNION ALL
SELECT 'table3', objart FROM "SchemaA".table_3
UNION ALL
SELECT 'table4', objart FROM "SchemaA".table_4
UNION ALL
SELECT 'table5', objart FROM "SchemaA".table_5)
SELECT table_name, objart, COUNT(objart) AS "Number_of_objects"
FROM subquery_art
GROUP BY table_name, objart
ORDER BY table_name, objart;

EDIT: Fixed some misspellings of 'Renoir'

Subqueries to determine most common attributes

SELECT DISTINCT title
FROM   archived
WHERE  showurl = (
   SELECT showurl
   FROM   archived
   GROUP  BY 1
   ORDER  BY count(*) DESC, 1
   LIMIT  1
   )
AND    reference = (
   SELECT reference
   FROM   archived
   GROUP  BY 1
   ORDER  BY count(*) DESC, 1
   LIMIT  1
   );

I added DISTINCT to the outer SELECT to fold duplicate titles. It's undefined what you need exactly.

Shorter with window functions

A more modern / sophisticated / elegant approach would be with two window functions in the ORDER BY clause, allowing for a very terse query:

SELECT title
FROM   archived
ORDER  BY count(*) OVER (PARTITION BY showurl)   DESC
         ,count(*) OVER (PARTITION BY reference) DESC
LIMIT 1;

Not necessarily faster than the first solution, though. Again, guessing what you might want exactly, I only retrieve a single winning title.

The trick here is to partition the rows by showurl and reference, run a count over each partition and order descending by that count. This way, you can determine a winner in a single query level.

This will not work, if there is no row with the most common showurl and the most common reference at the same time. Again, exact requirements would be needed ...

-> SQLfiddle demo.

Best Answer

Related Solutions

PostgreSQL insert into table (not origin) based on a condition on fields on different tables

Postgresql – Retrieve title from rows with a maximum count in other columns

Subqueries to determine most common attributes

Shorter with window functions

Related Question