I have a small table of 135,000 rows.
Here is the schema:
id integer NOT NULL nextval('history_sum_id_seq'::regclass)
zone_id integer NOT NULL
spot_id character varying(24) NULL::character varying
customer_id integer
broker_id integer
date date NOT NULL
created_date timestamp(0) without time zone NOT NULL
statistics_hits integer 0
statistics_real_hits integer 0
statistics_paid_hits integer 0
statistics_clicks integer 0
earnings_eur double precision '0'::double precision
earnings_usd double precision '0'::double precision
earnings_rub double precision '0'::double precision
broker_name character varying(100) NULL::character varying
customer_email character varying(128) NULL::character varying
customer_commission integer
This is my SQL:
SELECT
"hs"."customer_id",
"hs"."customer_email" AS "account",
CASE WHEN max("monthlyAvg"."monthlyAvgValue") is not null THEN
max("monthlyAvg"."monthlyAvgValue") ELSE 0::int END AS "monthlyAvg",
CASE WHEN max("weeklyAvg"."weeklyAvgValue") is not null THEN
max("weeklyAvg"."weeklyAvgValue") ELSE 0::int END AS "weeklyAvg"
FROM
"history_sum" AS "hs"
LEFT JOIN (
SELECT
"hs"."customer_id" AS "customer_id",
round((SUM("hs"."statistics_real_hits")::numeric / 30::numeric), 2) AS "monthlyAvgValue"
FROM
"history_sum" AS "hs"
WHERE
"hs"."date" BETWEEN '2016-04-29' AND '2016-05-29'
GROUP BY
"hs"."customer_id",
"hs"."date"
ORDER BY
"hs"."date" DESC
) AS "monthlyAvg" ON "monthlyAvg"."customer_id" = "hs"."customer_id"
LEFT JOIN (
SELECT
"hs"."customer_id" AS "customer_id",
round((SUM("hs"."statistics_real_hits")::numeric / 7::numeric), 2) AS "weeklyAvgValue"
FROM
"history_sum" AS "hs"
WHERE
"hs"."date" BETWEEN '2016-05-22' AND '2016-05-29'
GROUP BY
"hs"."customer_id",
"hs"."date"
ORDER BY
"hs"."date" DESC
) AS "weeklyAvg" ON "weeklyAvg"."customer_id" = "hs"."customer_id"
WHERE
"hs"."customer_email" is not null
GROUP BY
"hs"."customer_id",
"hs"."customer_email",
"hs"."customer_commission"
ORDER BY
account ASC
LIMIT 50
And this is the QUERY PLAN, which I couldn't understand:
QUERY PLAN
Limit (cost=35325.56..35325.68 rows=50 width=92) (actual time=6548.611..6548.615 rows=47 loops=1)
-> Sort (cost=35325.56..35341.42 rows=6347 width=92) (actual time=6548.609..6548.610 rows=47 loops=1)
Sort Key: hs.customer_email
Sort Method: quicksort Memory: 28kB
-> HashAggregate (cost=35051.24..35114.71 rows=6347 width=92) (actual time=6548.525..6548.546 rows=47 loops=1)
Group Key: hs.customer_email, hs.customer_id, hs.customer_commission
-> Hash Left Join (cost=7636.01..28512.35 rows=373651 width=92) (actual time=3.648..1710.014 rows=14053634 loops=1)
Hash Cond: (hs.customer_id = "monthlyAvg".customer_id)
-> Hash Left Join (cost=2833.16..10289.12 rows=135914 width=60) (actual time=1.035..104.126 rows=530354 loops=1)
Hash Cond: (hs.customer_id = "weeklyAvg".customer_id)
-> Seq Scan on history_sum hs (cost=0.00..5587.55 rows=135914 width=28) (actual time=0.003..35.919 rows=135925 loops=1)
Filter: (customer_email IS NOT NULL)
Rows Removed by Filter: 30
-> Hash (cost=2831.54..2831.54 rows=130 width=36) (actual time=1.024..1.024 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Subquery Scan on "weeklyAvg" (cost=2829.91..2831.54 rows=130 width=36) (actual time=1.020..1.021 rows=9 loops=1)
-> Sort (cost=2829.91..2830.24 rows=130 width=12) (actual time=1.019..1.019 rows=9 loops=1)
Sort Key: hs_1.date DESC
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=2823.07..2825.35 rows=130 width=12) (actual time=0.995..0.999 rows=9 loops=1)
Group Key: hs_1.date, hs_1.customer_id
-> Bitmap Heap Scan on history_sum hs_1 (cost=41.66..2813.38 rows=1292 width=12) (actual time=0.186..0.678 rows=1484 loops=1)
Recheck Cond: ((date >= '2016-05-22'::date) AND (date <= '2016-05-29'::date))
Heap Blocks: exact=119
-> Bitmap Index Scan on sum_date_customer_email (cost=0.00..41.34 rows=1292 width=0) (actual time=0.169..0.169 rows=1484 loops=1)
Index Cond: ((date >= '2016-05-22'::date) AND (date <= '2016-05-29'::date))
-> Hash (cost=4795.97..4795.97 rows=550 width=36) (actual time=2.603..2.603 rows=36 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Subquery Scan on "monthlyAvg" (cost=4789.10..4795.97 rows=550 width=36) (actual time=2.582..2.599 rows=39 loops=1)
-> Sort (cost=4789.10..4790.47 rows=550 width=12) (actual time=2.582..2.583 rows=39 loops=1)
Sort Key: hs_2.date DESC
Sort Method: quicksort Memory: 26kB
-> HashAggregate (cost=4754.44..4764.06 rows=550 width=12) (actual time=2.552..2.569 rows=39 loops=1)
Group Key: hs_2.date, hs_2.customer_id
-> Bitmap Heap Scan on history_sum hs_2 (cost=176.71..4713.25 rows=5492 width=12) (actual time=0.404..1.467 rows=5783 loops=1)
Recheck Cond: ((date >= '2016-04-29'::date) AND (date <= '2016-05-29'::date))
Heap Blocks: exact=215
-> Bitmap Index Scan on sum_date_customer_email (cost=0.00..175.34 rows=5492 width=0) (actual time=0.378..0.378 rows=5783 loops=1)
Index Cond: ((date >= '2016-04-29'::date) AND (date <= '2016-05-29'::date))
Planning time: 0.643 ms
Execution time: 6548.802 ms
41 row(s)
Total runtime: 6,551.098 ms
As you can see, at some point Postgres increases the number of rows to the crazy rows=530354
and I don't know why it happens.
Subqueries, if I run them separately, are very fast, but when I combine them into one query, that rows explosion happens.
I need to add another 3 simple subqueries here, and after I do that, the number of rows for scan will expand to
Sort (cost=4793.22..4794.60 rows=550 width=24) (actual time=2.881..127147.085 rows=3353783690 loops=1)
and as a result I will get
Planning time: 1.490 ms
Execution time: 1593005.255 ms
This is the FROM clause with the additional subqueries:
"history_sum" AS "hs"
LEFT JOIN (
SELECT
"hs"."customer_id" AS "customer_id",
round((SUM("hs"."statistics_real_hits")::numeric / 30::numeric), 2) AS "monthlyAvgValue"
FROM
"history_sum" AS "hs"
WHERE
"hs"."date" BETWEEN :minus30days AND :yesterday
GROUP BY
"hs"."customer_id",
"hs"."date"
ORDER BY
"hs"."date" DESC
) AS "monthlyAvg" ON "monthlyAvg"."customer_id" = "hs"."customer_id"
LEFT JOIN (
SELECT
"hs"."customer_id" AS "customer_id",
round((SUM("hs"."statistics_real_hits")::numeric / 30::numeric), 2) AS "monthlyAvgValue"
FROM
"history_sum" AS "hs"
WHERE
"hs"."date" BETWEEN :minus60days AND :minus31days
GROUP BY
"hs"."customer_id",
"hs"."date"
ORDER BY
"hs"."date" DESC
) AS "prevMonthlyAvg" ON "prevMonthlyAvg"."customer_id" = "hs"."customer_id"
LEFT JOIN (
SELECT
"hs"."customer_id" AS "customer_id",
round((SUM("hs"."statistics_real_hits")::numeric / 7::numeric), 2) AS "weeklyAvgValue"
FROM
"history_sum" AS "hs"
WHERE
"hs"."date" BETWEEN :minus7days AND :yesterday
GROUP BY
"hs"."customer_id",
"hs"."date"
ORDER BY
"hs"."date" DESC
) AS "weeklyAvg" ON "weeklyAvg"."customer_id" = "hs"."customer_id"
LEFT JOIN (
SELECT
"hs"."customer_id" AS "customer_id",
round((SUM("hs"."statistics_real_hits")::numeric / 7::numeric), 2) AS "weeklyAvgValue"
FROM
"history_sum" AS "hs"
WHERE
"hs"."date" BETWEEN :minus15days AND :minus7days
GROUP BY
"hs"."customer_id",
"hs"."date"
ORDER BY
"hs"."date" DESC
) AS "prevWeeklyAvg" ON "prevWeeklyAvg"."customer_id" = "hs"."customer_id"
LEFT JOIN (
SELECT
"hs"."customer_id" AS "customer_id",
SUM("hs"."statistics_real_hits")::int AS "yesterdayHitsValue"
FROM
"history_sum" AS "hs"
WHERE
"hs"."date" = :yesterday
GROUP BY
"hs"."customer_id"
ORDER BY
"hs"."customer_id" ASC
) AS "yesterdayHits" ON "yesterdayHits"."customer_id" = "hs"."customer_id"
Why does it happen? What am I doing wrong?
The PostgreSQL version is 9.5.
Best Answer
As far as I can tell, you only need a single derived table together with a conditional aggregation:
You can add more aggregations without the need to add new derived tables.
Using an
order by
in a sub-select or derived table is useless and can be removed (not sure if Postgres optimizes that away).