Starting with PostgreSQL 9.4 there's a Standard SQL percentile_disc
aggregate function:
percentile_cont(0.25) WITHIN GROUP (ORDER BY column1),
percentile_cont(0.5 ) WITHIN GROUP (ORDER BY column1),
percentile_cont(0.75) WITHIN GROUP (ORDER BY column1),
If you can't upgrade on your Heroku instance you can rewrite it. I did that a few years ago on Teradata, besides the proprietary QUALIFY
it's similar syntax:
According to SQL:2008 PERCENTILE_DISC(x)
is the first value with a CUME_DIST
greater than or equal to x.
This directly translates to the row with a
ROW_NUMBER() OVER (PARTITION BY groupcol ORDER BY ordercol)
= CEILING(COUNT(*) OVER (PARTITION BY groupcol) * x
with
v as (
select * from (values
(1),(2),(3),(4),(5),(6),(7),(8),(9),
(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),
(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),
(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),
(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),
(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),
(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),
(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),
(80),(81),(82),(83),(84),(85),(86),(87),(88),(89)
) as v
),
p as (
select
column1,
row_number() OVER (ORDER BY column1 ASC) as rn,
count(*) OVER () as cnt
from v
)
select *
from p
where rn in (CEILING(cnt * 0.25)
,CEILING(cnt * 0.5)
,CEILING(cnt * 0.75))
See Fiddle
In Postgres 9.4+, you can use the hypothetical-set aggregate function percent_rank()
in a LATERAL
join like this:
SELECT *
FROM daily_data d, LATERAL (
SELECT round(percent_rank(d.daily_val) WITHIN GROUP (ORDER BY daily_val)::numeric
, 6) AS pctl_calc
FROM daily_data
WHERE company_id = d.company_id
AND trade_date < d.trade_date
) x
ORDER BY company_id, trade_date;
The CROSS JOIN LATERAL
(, LATERAL
for short) is safe, because a subquery with an aggregate function always returns a row. (Else you might want LEFT JOIN LATERAL (...) x ON true
.)
Minor difference: The first row in each group returns 0, not 1 like in your example. Easily fixed, if you need it.
This would do the same job in Postgres 9.3:
SELECT d.*
, round(CASE WHEN x.ct = 0 THEN numeric '1' -- or 0 ?
ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
FROM daily_data d, LATERAL (
SELECT count(daily_val) AS ct
, count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
FROM daily_data
WHERE company_id = d.company_id
AND trade_date < d.trade_date
) x
ORDER BY company_id, trade_date;
I use count(daily_val)
for the full count, not count(*)
to ignore NULL values - necessary if daily_val
can be NULL, .
SQL Fiddle.
Or, without LATERAL
join for even older versions. May be faster, too:
SELECT company_id, trade_date, daily_val, pctl
, round(CASE WHEN ct = 0 THEN numeric '1' -- 0 ?
ELSE ct_lt / ct END, 6) AS pctl_calc
FROM (
SELECT d.*
, count(x.daily_val) AS ct
, count(x.daily_val < d.daily_val OR NULL)::numeric AS ct_lt
FROM daily_data d
LEFT JOIN daily_data x ON x.company_id = d.company_id
AND x.trade_date < d.trade_date
GROUP BY d.id -- pg 9.1+
) sub
ORDER BY company_id, trade_date;
Assuming id
is the PK for a simplified GROUP BY
, requires pg 9.1 though.
Best Answer
If you want percentiles per
state
, then usePARTITION BY state_id
in theOVER
clause.And that
GROUP BY
looks spurious at least. I think it needs to be removed if you want percentiles. And group by the PK is a no-operation anyway.