Storing table names in other (user) tables is an instance of storing structural metadata and user data side by side. This has a variety of interesting effects, and some of those effects make it attractive to designers. The down side is less apparent.
The down side is this: you end up doing a lot of data management "by hand" in the SQL that a different design might have allowed the DBMS to do for you. This makes your code harder to maintain and slower to run.
Having said that, I'll admit to having pulled this stunt several times, and the results were usually good enough so that I didn't regret the choice.
There's a second thing going on in your case. Types 1 and 2 of both widgets and products are cases of class/subclass modeling (or, if you prefer type/subtype modeling). This kind of thing is simple and straightforward in object modeling, because inheritance takes care of most of the difficulties for you. Not so in relational modeling. Relational modeling, as such, has no mechanism for inheritance. Some variants of SQL have extensions that make inheritance easier to model and to implement.
There are many case of questions here in the DBA area that boil down to the question of how to implement subclasses (or subtypes) in SQL tables. Some of those questions are grouped under this tag: subtypes. Over in StackOverflow, there are even more such questions, and there are three tags that relate to three design techniques that might help: single-table-inheritance, class-table-inheritance, and shared-primary-key.
Your design resembles a class table inheritance design, except that you use embedded table names instead of shared primary key to implement the IS-A relationships between subclasses and classes. You might want to explore using shared primary key, and then creating views that collect all the data for each subclass by joining the superclass table with each subclass. I'm not sure you want to go this way. It could get awfully unwieldy once you have hundreds of different product types.
My first thought would be to use a window function like ROW_NUMBER()
, almost identical to your solution.
Here are a few more ways to write this query:
WITH mytable AS
( --- the query --- )
SELECT a_key
FROM mytable AS t
WHERE EXISTS
( SELECT *
FROM mytable AS n
WHERE t.a_value = n.a_value
AND t.a_timestamp < n.a_timestamp
) ;
Using another window function, LEAD()
:
SELECT a_key
FROM
( SELECT a_key,
(LEAD(a_timestamp) OVER (PARTITION BY a_value
ORDER BY a_timestamp)
IS NOT NULL) AS ok
FROM mytable
) AS pointless
WHERE ok ;
A variation on the above, using a different condition to check which rows to keep:
(LEAD(a_value) OVER (ORDER BY a_value, a_timestamp)
= a_value) AS ok
And a rather weird solution:
SELECT a_key
FROM mytable
EXCEPT
( SELECT DISTINCT ON (a_value) a_key
FROM mytable
ORDER BY a_value, a_timestamp DESC
) ;
Modified NOT IN
to NOT EXISTS
:
SELECT a_key
FROM mytable AS t
WHERE NOT EXISTS
( SELECT 1
FROM mytable AS m
WHERE m.a_value = t.a_value
HAVING MAX(m.a_timestamp) = t.a_timestamp
) ;
And modified again to a JOIN
:
SELECT t.a_key
FROM mytable AS t
JOIN
( SELECT a_value, max(a_timestamp) AS a_timestamp
FROM mytable
GROUP BY a_value
) AS m
ON t.a_value = m.a_value
AND t.a_timestamp < m.a_timestamp ;
Regarding performance, I did a test with a small 200K rows table (not a subquery), with and without indexes, of the various methods.
Since the query needs to return a large majority of rows (more than 50% and could be close to 100% depending on the distribution), I wouldn't expect indexes to be particularly helpful.
The window function solutions (ROW_NUMBER()
, RANK()
, LEAD()
) performed quite well and similarly to each other (less than 2 sec).
The EXISTS
method came a bit slower and last was the DISTINCT ON
method (around 3 sec).
The NOT IN
method by @joanolo shows a materialized subplan and was really slow (but it may be more efficient if the mytable
subquery returns fewer rows). Modified to a similar NOT EXISTS
lowered the response time to about 3 seconds. The JOIN
modification was somewhat better, around 2 - 2.5 sec.
The plans showed sequential scans of course and most improved with indexes, doing index scans instead (and lowering response time to about 1-1.5 sec for the window functions and the join methods).
(I used (a_value, a_timestamp)
and (a_value, a_timestamp, a_key)
indexes and variations changing to timestamp DESC
but the actual indexes are more or less irrelevant to the specific example, since we have no idea how complex the sybquery is.)
Best Answer
You can add more conditions to the WHERE clause using the format:
dbfiddle here