Postgresql – Query Amount of entries per column from multiple tables + provide table name

postgresql

I have five tables with all having the same column (objart):

I queried for the number of entries and combined the result with UNION, which worked well.

Two questions:

1) Is there a shorter/ better way to do it? Other possibilities than UNION?

2) The selection provides me the "objart"-entry and how many rows of this entry are in the table(s). How can I add the table name from which table the specific "objart" entry is derived? (Up to now I don't know whether a "objart" entry occurs in just one OR in more tables)

SELECT objart, COUNT(*) AS "Number of objects"
FROM "SchemaA".table_1
GROUP BY objart
UNION
SELECT objart, COUNT(*)
FROM "SchemaA".table_2
GROUP BY objart
UNION
SELECT objart, COUNT(*)
FROM "SchemaA".tanle_3
GROUP BY objart
UNION
SELECT objart, COUNT(*)
FROM "SchemaA".table_4
GROUP BY objart
UNION
SELECT objart, COUNT(*)
FROM "SchemaA".table_5
GROUP BY objart
ORDER BY objart;

Best Answer

A UNION (or perhaps a UNION ALL, for reasons that I will note below) is a reasonable way to do this. There are a couple of ways to write the UNION. You did the first, which is to aggregate at each individual table level and then UNION the aggregated results. Is there a possibility that two different tables will have the same object in them? That is, something like this:

"Schema".table_1
OBJART
'Renoir'
'Picasso'

"Schema".table_2
OBJART
'Rodin'
'Picasso'

If this is the case, your query above would actually discard one of the sub-results because of how UNION scans the resultsets it is UNIONing for duplicates and discards them . In other words, this subquery

SELECT objart, COUNT(*) AS "Number of objects" FROM "SchemaA".table_1

Will produce this resultset given the data above:

objart   Number of objects
'Renoir'   1
'Picasso'  1

And this query:

SELECT objart, COUNT(*) AS "Number of objects" FROM "SchemaA".table_2

Will produce this resultset:

objart   Number of objects
'Rodin'   1
'Picasso'  1

However, because UNION scans input resultsets and discards duplicates, you will lose one of the 'Picasso' rows and the overarching UNION query will result in this:

objart   Number of objects
'Renoir'   1
'Picasso'  1
'Rodin'    1

I think what you probably want is this:

objart   Number of objects
'Renoir'   1
'Picasso'  2
'Rodin'    1

Also, a sidenote -- can objart ever be NULL? If so, and if you don't want to count NULL rows, I wouldn't use COUNT(*) because that counts all of the rows in the table. COUNT(objart) would work better as it will only count non-NULL values. Yes, it might work either way but to be safe I always like to be very explicit that I am telling the DB to count the number of non-NULL values for a particular column and not the number of rows in a table overall. e.g.:

SELECT objart, COUNT(objart) AS "Number of objects" FROM "SchemaA".table_1

So, there is another way that might be a bit easier. UNION ALL all of the data in the underlying tables and then compute the aggregate on top of them. Here is the query for that:

WITH subquery_art AS
(SELECT objart FROM "SchemaA".table_1
UNION ALL
SELECT objart FROM "SchemaA".table_2
UNION ALL
SELECT objart FROM "SchemaA".table_3
UNION ALL
SELECT objart FROM "SchemaA".table_4
UNION ALL
SELECT objart FROM "SchemaA".table_5)
SELECT objart, COUNT(objart) AS "Number_of_objects"
FROM subquery_art
GROUP BY objart
ORDER BY objart;

If you want to add the table name that you're sourcing data from, here's how to do it:

WITH subquery_art AS
(SELECT 'table1' as table_name, objart FROM "SchemaA".table_1
UNION ALL
SELECT 'table2', objart FROM "SchemaA".table_2
UNION ALL
SELECT 'table3', objart FROM "SchemaA".table_3
UNION ALL
SELECT 'table4', objart FROM "SchemaA".table_4
UNION ALL
SELECT 'table5', objart FROM "SchemaA".table_5)
SELECT table_name, objart, COUNT(objart) AS "Number_of_objects"
FROM subquery_art
GROUP BY table_name, objart
ORDER BY table_name, objart;

EDIT: Fixed some misspellings of 'Renoir'