Postgresql – Ranking each row in Postgres from 1-100 when comparing in groups

postgresql

For each row in my table, I want to calculate its "expensiveness" when grouped together with other rows with same year/month and size.

Example data:

year,month,size,price
2017,1,300,1000
2017,1,300,1100

I tried a query like this:

select 
*,
ntile(100) over (partition by year, month, size order by price) as percentile
from mytable;

The problem I have is that rows with the same price can get different percentiles, especially if a certain price is very common. Is there an alternate way to rank things from 1-100 that would give results that seem more consistent?

I don't mind if I need to create multiple views/queries to achieve this, e.g. perhaps I should calculate every percentile first and then my main query finds the "nearest" percentile, etc.

Also to note: sometimes the data may be sparse per group, e.g. less than 100 rows per group. Also, prices often group together so there will be even less discrete values.

Best Answer

I think I may have found the answer. The following query seems to produce results that are as I expect:

select 
*,
(100*cume_dist() over (partition by year, month, size order by price))::int as percentile
from mytable;