Postgres – Hypothetical-Set Aggregate Function Equivalent to `ntile`

aggregatepostgresql

Is there Hypothetical-Set Aggregate Function equivalent to ntile (or some other good solution) in Postgres?

I have this query:

select
    frctl
    ,*
from
    (select 
        *
     from
            d_al
     where
        not rtn is null
        and not fund_val is null
     ) dx
    ,lateral(
        select
            round(percent_rank(dx.fund_val) WITHIN GROUP (ORDER BY fund_val)::numeric
              , 6) AS frctl
        from 
            d_al
        where
            gen_qtr_end_dt <= dx.gen_qtr_end_dt
            and not rtn is null
            and not fund_val is null
    ) x
order by gen_qtr_end_dt_id, frctl

The query produces periodic historical percentile ranks. Ranking a value in a certain period/date relative to the current period/date plus all the historical periods/dates (periods before it) time series-wise/chronologically.

It works perfectly, except I want fractiles (i.e. the option to create deciles, quartiles, etc.) like ntile(#) does naturally. Do I have to build a case statement to fit the fractiles I want? For example, if I want ntile(4) (quartiles), do I have to build a case statement based off of 0, 0.25, 0.5,0.75,1. Then if I want ntile(10) (deciles), do I have to build a case statement based off of 0, 0.1, 0.2,0.3,0.4 ... etc? Or is there an ntile type Hypothetical-Set Aggregate Function I am missing?

Helpful links:
https://www.postgresql.org/docs/current/static/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE

Percentile rank that takes sorted argument (or same functionality) in PostgreSQL 9.3.5
(In the link directly above the problem is a bit different, but very related.)

The data:

  • Big – efficiency is important, but not the focus of my question.
  • Table d_al has three columns, two matter here:
  • gen_qtr_end_dt – not unique, not null, type date
  • fund_val – can be null, type numeric
  • rtn – can be null, type numeric, not important here

I have Postgres 9.6.

PS – this query does all of the history, but my next step is to do a number of days rolling period look back (rather than all of the history).

edit 1:
Here is how I am solving it now (with a case statement as mentioned):

I put the above query in an cte then…

   with pl as (
    select
        x.pctl
        ,dx.fund_val
        , dx.rtn
        ,dx.gen_qtr_end_dt
    from
        (select 
            *
         from
                d_al
         where
            not rtn is null
            and not fund_val is null
         ) dx
        ,lateral(
            select
                round(percent_rank(dx.fund_val) WITHIN GROUP (ORDER BY fund_val)::numeric
                  , 6) AS pctl
            from 
                d_al
            where
                gen_qtr_end_dt <= dx.gen_qtr_end_dt
                and not rtn is null
                and not fund_val is null
        ) x
)
-- , f as( 
    select 
        gen_qtr_end_dt_id
        ,case   when pl.pctl < 0.1 then 1
                when pl.pctl < 0.2 then 2
                when pl.pctl < 0.3 then 3
                when pl.pctl < 0.4 then 4
                when pl.pctl < 0.5 then 5
                when pl.pctl < 0.6 then 6
                when pl.pctl < 0.7 then 7
                when pl.pctl < 0.8 then 8
                when pl.pctl < 0.9 then 9
                else 10
         end
            frctl 
        ,rtn
        ,fund_val
        ,*
    from 
        pl
    order by
        gen_qtr_end_dt, frctl

…which is a bit cumbersome/rigid but doable if need be.

edit 2:
And here is a sample of the output from edit 1 above:

frctl   fund_val    pctl    gen_qtr_end_dt
1   -14.514 0   3/31/2001
2   -8.618  0.142857    3/31/2001
3   1.707   0.285714    3/31/2001
5   26.162  0.428571    3/31/2001
6   141.873 0.571429    3/31/2001
8   216 0.714286    3/31/2001
9   254 0.857143    3/31/2001
1   -15.237 0.071429    6/30/2001
1   -32 0   6/30/2001
3   -6.949  0.285714    6/30/2001
5   6.307   0.428571    6/30/2001
6   28.542  0.571429    6/30/2001
7   140.816 0.642857    6/30/2001
9   239 0.857143    6/30/2001
1   -47 0.043478    9/30/2001
1   -63.367 0   9/30/2001
2   -16.599 0.130435    9/30/2001
4   -6.087  0.347826    9/30/2001
6   31.425  0.565217    9/30/2001
7   47.137  0.608696    9/30/2001
8   150.678 0.73913 9/30/2001
8   200 0.782609    9/30/2001
10  1902.684    0.956522    9/30/2001
1   -246.545    0   12/31/2001
2   -18.731 0.125   12/31/2001
4   -0.043  0.375   12/31/2001
4   -6  0.34375 12/31/2001
5   9.285   0.46875 12/31/2001
6   43.519  0.59375 12/31/2001
7   111 0.65625 12/31/2001
8   154.573 0.78125 12/31/2001
10  1017.514    0.9375  12/31/2001
1   -23.678 0.095238    3/31/2002
4   2.229   0.357143    3/31/2002
5   14  0.428571    3/31/2002
5   17.689  0.452381    3/31/2002
6   67.245  0.595238    3/31/2002
7   130.604 0.642857    3/31/2002
8   156 0.761905    3/31/2002
8   179.399 0.785714    3/31/2002
9   213.756 0.833333    3/31/2002
10  855.2   0.928571    3/31/2002
1   -26.536 0.076923    6/30/2002
3   1.295   0.288462    6/30/2002
4   9   0.365385    6/30/2002
5   16.714  0.423077    6/30/2002
6   64.547  0.557692    6/30/2002
6   103.539 0.596154    6/30/2002
8   181.284 0.769231    6/30/2002
9   203 0.807692    6/30/2002
10  600.194 0.923077    6/30/2002
10  284.306 0.903846    6/30/2002
1   -85 0.016129    9/30/2002
1   -25.475 0.096774    9/30/2002
2   -20.394 0.129032    9/30/2002
4   2.551   0.33871 9/30/2002
6   102.395 0.564516    9/30/2002
7   113.453 0.612903    9/30/2002
8   168.205 0.725806    9/30/2002
9   248 0.854839    9/30/2002
10  800.551 0.935484    9/30/2002
10  460.067 0.903226    9/30/2002

edit 3: As it stands, the way I am doing it here is so slow it is unusable. The slow part is the query with percent_rank() in it.

Best Answer

Try trunc(10 * pl.pctl) + 1, but as percent_rank returns 0 <= n <= 1 the maximum value will be 11 instead of 10.

CUME_DIST is quite similar to PERCENT_RANK but returns 0 < n <= 1, thus you might switch to 1-cume_dist... (ORDER BY fund_val DESC) for the pctl calculation.