PostgreSQL 9.4 – How to Get 10th and 90th Percentile by Customer

postgresqlpostgresql-9.4

I have a table which contains customers and scores (based on different factors, irrelevant in this case; a customer can have multiple scores), which looks like this:

customer_id | score | score_giver_id
====================================
          1 | 100   | 1
          1 | 102   | 1
          1 | 101   | 1
          1 | 140   | 1
          2 | 131   | 3
          1 | 44    | 1
          3 | 223   | 1
          3 | 1     | 2
          3 | 201   | 1
          3 | 211   | 1
          3 | 231   | 1
          3 | 243   | 1

The score_giver_id is irrelevant, but I'd still like to fetch it.

In the example above, when getting the 50th percentile, grouped by customer_id, the result should be (I picked the 50th percentile in this example, because it illustrates what I want to do better):

customer_id | score | score_giver_id
====================================
          1 | 101   | 1
          2 | 131   | 3
          3 | 223   | 1

I used the method described here.

I need to get the value that is at the 10th percentile, respectively at the 90th percentile in PostgreSQL. I've seen that since 9.4 there is an ntile function, but I don't really understand how it works, what it does, and if it helps me.

I've found a nice snippet for MySQL, which works (even though there are some caveats), but I'd like to use built-in functions if available (for MySQL there are none, hence the snippet).

Best Answer

It seems you are after the percentile_disc() ordered-set aggregate function.

The documentation says the following about it:

percentile_disc(fraction) WITHIN GROUP (ORDER BY sort_expression)

discrete percentile: returns the first input value whose position in the ordering equals or exceeds the specified fraction

The syntax is a bit strange for an aggregate, but using it is easy:

SELECT percentile_disc(0.9) WITHIN GROUP (ORDER BY score)
  FROM customer_score
 GROUP BY customer_id;

You define the column from which to take the percentile in the ORDER BY clause.