The problem is that you cannot use parameters as object names (tables, columns, etc.) in straight SQL statements, like your ALTER TABLE
above.
I use 'straight' here as an opposite of 'dynamic' - the thing you need:
...
loop
EXECUTE format($$ALTER TABLE %I SET SCHEMA TO data_archived$$,
table_rec.table_name);
END loop;
...
Notes:
- here I am using dollar quoting when building the dynamic query. This allows me to use 'normal' syntax, as opposed to multiplicating single quotes, for example (not present in this example). This way most editors will highlight the statements nicely.
- I also use
format()
with the %I
format specifier to have the object name properly quoted if necessary. This approach is far more readable than building the query with concatenation of string constants and some quote_ident()
calls. It is present in versions 9.1 and newer.
- I have a feeling that you compare stuff like
quote_ident('This_table') IN ('This_table')
, which won't be true. Either use quote_ident()
on both sides - or it is easier to just omit quoting and compare the string values.
Test setup
Your original setup in the fiddle leaves room for improvement. I kept asking for your setup for a reason.
You have these indexes on film_actor
:
"film_actor_pkey" PRIMARY KEY, btree (actor_id, film_id)
"idx_fk_film_id" btree (film_id)
Which is pretty helpful already. But to best support your particular query, you would have a multicolumn index on (film_id, actor_id)
, columns in this order. A practical solution: replace idx_fk_film_id
with an index on (film_id, actor_id)
- or create the PK on (film_id, actor_id)
for the purpose of this test, like I do below. See:
In a read-only (or mostly, or generally when VACUUM can keep up with write activity) it also helps to have an index on (title, film_id)
to allow index only-scans. My test case is now highly optimized for read performance.
Type mismatch between film.film_id
(integer
) and film_actor.film_id
(smallint
). While that works it makes queries slower and can lead to various complications. Also makes FK constraints more expensive. Never do this if it can be avoided. If you are not sure, pick integer
over smallint
. While smallint
can save 2 bytes per field (often consumed by alignment padding) there are more complication than with integer
.
To optimize the performance of the test itself, create indexes and constraints after bulk-inserting lots of rows. It is substantially slower to add tuples incrementally to existing indexes than to create them from scratch with all rows present.
Unrelated to this test:
Free-standing sequences plus column defaults instead of much simpler and more reliable serial
(or IDENTITY
) columns. Don't.
timestamp without timestamp
is typically unreliable for a column like last_update
. Use timestamptz
instead. And note that column defaults do not cover the "last update", strictly speaking.
The length modifier in character varying(255)
indicates that the test case is not intended for Postgres to begin with because the odd length is pretty pointless here. (Or the author is clueless.)
Consider the audited test case in the fiddle:
db<>fiddle here - building on your fiddle, optimized and with added queries.
Related:
A test setup with a 1000 films and 200 actors has limited validity. The most efficient queries take < 0.2 ms. Planning time is more than execution time. A test with 100k or more rows would be more revealing.
Why retrieve only first names of authors? Once you retrieve multiple columns, you already have a slightly different situation.
ORDER BY title
makes no sense while filtering for a single title with WHERE title = 'ACADEMY DINOSAUR'
. Maybe ORDER BY film_id
?
And for total runtime rather use EXPLAIN (ANALYZE, TIMING OFF)
to reduce (potentially misleading) noise with sub-timing overhead.
Answer
It's hard to form a simple rule of thumb, because total performance depends on many factors. Very basic guidelines:
Aggregating all rows in sub-tables carries less overhead but only pays when you actually need all rows (or a very large part).
For selecting few rows (your test!), different query techniques yield better results. That's where LATERAL
comes in. It carries more overhead but only reads required rows from sub-tables. A big win if only a (very) small fraction is needed.
For your particular test case, I would also test an ARRAY constructor in the LATERAL
subquery:
SELECT f.film_id, f.title, a.actors
FROM film
LEFT JOIN LATERAL (
SELECT ARRAY (
SELECT a.first_name
FROM film_actor fa
JOIN actor a USING (actor_id)
WHERE fa.film_id = f.film_id
) AS actors
) a ON true
WHERE f.title = 'ACADEMY DINOSAUR';
-- ORDER BY f.title; -- redundant while we filter for a single title
While only aggregating a single array in the lateral subquery, a simple ARRAY constructor performs better than the aggregate function array_agg()
. See:
Or with a lowly correlated subquery for the simple case:
SELECT f.film_id, f.title
, ARRAY (SELECT a.first_name
FROM film_actor fa
JOIN actor a USING (actor_id)
WHERE fa.film_id = f.film_id) AS actors
FROM film f
WHERE f.title = 'ACADEMY DINOSAUR';
Or, very basically, just 2x LEFT JOIN
and then aggregate:
SELECT f.film_id, f.title, array_agg(a.first_name) AS actors
FROM film f
LEFT JOIN film_actor fa USING (film_id)
LEFT JOIN actor a USING (actor_id)
WHERE f.title = 'ACADEMY DINOSAUR'
GROUP BY f.film_id;
These three seem fastest in my updated fiddle (planning + execution time).
Your first attempt (only slightly modified) is typically fastest to retrieve all or most films, but not for a small selection:
SELECT f.film_id, f.title, a.actors
FROM film f
LEFT JOIN (
SELECT fa.film_id, array_agg(first_name) AS actors
FROM actor
JOIN film_actor fa USING (actor_id)
GROUP by fa.film_id
) a USING (film_id)
WHERE f.title = 'ACADEMY DINOSAUR'; -- not good for a single (or few) films!
Tests with much bigger cardinalities will be more revealing. And don't generalise results lightly, there are many factors for total performance.
Best Answer
Most common ...
In statistics, this is called mode.
Yes, PostgreSQL has an aggregate called
mode
. I haven't messed with it in PostgreSQL.Retrieves those not wanted...
You want the resulting set of
all movies for a genre
minus the data set containingall movies rented by a customer for that genre
In PostgreSQL, you'll want to say
EXCEPT
instead ofMINUS
.