I have a query that does a lot of duplicate work:
SELECT visitor_id, '1'::text AS filter
FROM events
WHERE id IN (SELECT event_id FROM params
WHERE key = 'utm_campaign' AND value = 'campaign_one')
AND id IN (SELECT event_id FROM params
WHERE key = 'utm_source' AND value = 'facebook')
GROUP BY visitor_id
UNION ALL
SELECT visitor_id, '2'::text AS filter
FROM events
WHERE id IN (SELECT event_id FROM params
WHERE key = 'utm_campaign' AND value = 'campaign_two')
AND id IN (SELECT event_id FROM params
WHERE key = 'utm_source' AND value = 'facebook')
GROUP BY visitor_id
As you can see, it performs different filtering on the params table 4 times. I'm using Redshift and although it scans this table very quickly, I have quite a few of these statements UNION
ed together. Is there a way to rewrite the SQL using CASE
/IF
statements?
The example uses key = 'utm_source' AND value = 'facebook'
in both, but this is not necessarily true for all selects.
Best Answer
I see a possible minor simplification with a CTE (available in Redshift) for the repeated subquery with identical predicates:
Plain joins may also be faster than multiple
IN
semi-joins.This multicolumn index should allow index-only scans on
params
:Add another index on just
(event_id)
if you don't have one, yet.Arsenal of available query techniques for relational division under this related question on SO:
Avoid multiple SELECTs
As commented by @Andriy, we can squeeze out some more:
In Postgres we could use a shorter, faster
VALUES
expression, but that feature is not currently supported in Redshift:For just two
SELECT
in theUNION
, this won't buy much. But it should be a substantial improvement for more - like you mentioned.2nd query without CTE
The CTE is not necessary for the second query. You can simplify to:
About CTEs
Resources for Common Table Expressions (as requested in the comment):
Data-modifying CTEs are particularly useful. Example:
Basics with an added advanced example on this related answer on SO: