Postgresql – Optimizing PostgreSQL query with multiple subqueries

postgresql

I have a rather simple question (I think): I have one query with 2 window functions: COUNT() and AVG(). I also have 10 subqueries, each with a window function (COUNT() or AVG()).

Testing in pgadmin, this query works fast. The problem is: I'm currently testing against only 100 rows.

Here is the query:

SELECT
    COUNT(a.review_id)::int AS total_reviews,
    COALESCE(AVG(a.score), 0)::int AS average_score,
    (SElECT COUNT(review_id) FROM data.users_pro_reviews WHERE user_id = $1 AND score = 1)::int AS total_1,
    (SElECT COUNT(review_id) FROM data.users_pro_reviews WHERE user_id = $1 AND score = 2)::int AS total_2,
    (SElECT COUNT(review_id) FROM data.users_pro_reviews WHERE user_id = $1 AND score = 3)::int AS total_3,
    (SElECT COUNT(review_id) FROM data.users_pro_reviews WHERE user_id = $1 AND score = 4)::int AS total_4,
    (SElECT COUNT(review_id) FROM data.users_pro_reviews WHERE user_id = $1 AND score = 5)::int AS total_5,
    COALESCE((SELECT AVG(score) FROM data.users_pro_reviews WHERE user_id = $1 AND timestamp_reviewed < NOW()), 0)::int as today,
    COALESCE((SELECT AVG(score) FROM data.users_pro_reviews WHERE user_id = $1 AND timestamp_reviewed < NOW() - INTERVAL '1 DAY'), 0)::int as yesterday,
    COALESCE((SELECT AVG(score) FROM data.users_pro_reviews WHERE user_id = $1 AND timestamp_reviewed < NOW() - INTERVAL '1 WEEK'), 0)::int as one_week_ago,
    COALESCE((SELECT AVG(score) FROM data.users_pro_reviews WHERE user_id = $1 AND timestamp_reviewed < NOW() - INTERVAL '1 MONTH'), 0)::int as one_month_ago,
    COALESCE((SELECT AVG(score) FROM data.users_pro_reviews WHERE user_id = $1 AND timestamp_reviewed < NOW() - INTERVAL '6 MONTH'), 0)::int as six_montsh_ago
FROM
    data.users_pro_reviews a
WHERE
    a.user_id = $1

How can this query be optimized? Using JOINS? And how should I use window functions in JOINs?

Best Answer

It's probably more performant to write it this way instead of querying the same table multiple times:

SELECT
    COUNT(a.review_id)::int AS total_reviews,
    COALESCE(AVG(a.score), 0)::int AS average_score,
    SUM(CASE WHEN score = 1 THEN 1 ELSE 0 END)::int AS total_1,
    SUM(CASE WHEN score = 2 THEN 1 ELSE 0 END)::int AS total_2,
    SUM(CASE WHEN score = 3 THEN 1 ELSE 0 END)::int AS total_3,
    SUM(CASE WHEN score = 4 THEN 1 ELSE 0 END)::int AS total_4,
    SUM(CASE WHEN score = 5 THEN 1 ELSE 0 END)::int AS total_5
/* ...similar syntax for your AVG calculations */    
FROM
    data.users_pro_reviews a
WHERE
    a.user_id = $1

Using the SUM function with a well crafted CASE statement becomes logically equivalent to how you were previously trying to COUNT things. You can also implement similar logic for the AVG function (which I'll probably update this answer with an example for).

As per @LaurenzAlbe, this is another example you can use leveraging the FILTER keyword:

SELECT
    COUNT(a.review_id)::int AS total_reviews,
    COALESCE(AVG(a.score), 0)::int AS average_score,
    COUNT(*) FILTER (WHERE score = 1)::int AS total_1,
    COUNT(*) FILTER (WHERE score = 2)::int AS total_2,
    COUNT(*) FILTER (WHERE score = 3)::int AS total_3,
    COUNT(*) FILTER (WHERE score = 4)::int AS total_4,
    COUNT(*) FILTER (WHERE score = 5)::int AS total_5
/* ...similar syntax for your AVG calculations */    
FROM
    data.users_pro_reviews a
WHERE
    a.user_id = $1

I do have to agree this syntax feels a lot cleaner.