Postgresql – Calculate number of consecutive years (not gapless)

gaps-and-islandspostgresqlwindow functions

I have a request that wants to look at numbers of consecutive year buyers. For example, I have a table like this:

Order_year, Contact
2015, 123
2014, 123
2013, 123
2011, 123
2010, 123
2015, 234
2014, 234
2011, 234

I want to calculate for each year how many consecutive order years each contact_id has. The problem is, there are gaps such as contact_id = 123, it has 2010-2011, then 2013-2015.

The output would be:

report_year, contact, consecutive_years
2015, 123, 3
2014, 123, 2
2013, 123, 1
2012, 123, 0
2011, 123, 2
2010, 123, 1
2015, 234, 3
2014, 234, 2
2013, 234, 1
2012, 234, 0
2011, 234, 1

Or if you just want to calculate the max order year with the max consecutive years, that's fine, such as:

order_year, contact, consecutive_years
2015, 123, 3
2011, 123, 2
2015, 234, 2

Best Answer

So I've built a short SQL script to do this operation, based on the original sample input data:

Order_year, Contact
2015, 123
2014, 123
2013, 123
2011, 123
2010, 123
2015, 234
2014, 234
2011, 234

The script is as follows:

WITH RECURSIVE ranked_order AS(
  SELECT order_year, contact, 1 AS counter FROM orders
  UNION
  SELECT orders.order_year, orders.contact, ranked_order.counter + 1
  FROM orders JOIN ranked_order
  ON orders.order_year-1 = ranked_order.order_year
  WHERE orders.contact = ranked_order.contact
),
max_from_ranked_order AS(
  SELECT order_year AS report_year, contact, max(counter) AS consecutive_year 
  FROM ranked_order
  GROUP BY order_year, contact
),
start_and_consecutive_years AS(
  SELECT  contact, max(consecutive_year) AS consecutive_years, 
  report_year - consecutive_year AS start_year
  FROM max_from_ranked_order
  GROUP BY contact, start_year)
SELECT *, start_year + consecutive_years AS end_year 
FROM start_and_consecutive_years 
ORDER BY contact;

Now, for a little explanation:

Recusive Query

First, you need to find all the occasions of consecutive years, so I built the ranked_order table(maybe a bad name... maybe should have called it counted_order?) to count instances where a given contact ordered in back-to-back years.

Filter Results

So, due to the recursive query, I am left with all counted elements, rather than simply the maximum consecutive counts. So, I built max_from_ranked_order to extract the max counter value per order_year,contact grouping.

A little arithmetic...

This still doesn't quite yield what we need. Since we grouped over order_year,contact, you still end up with entries like

report_year, contact, consectuive_year
2015, 123, 3
2014, 123, 2
2013, 123, 1
2015, 234, 2
2014, 234, 1
etc...

What you'll notice is that you have unique, information which you can group on by observing the starting year of the consecutive years. So, I created start_and_consecutive_years to take the maximum number of consecutive purchasing years for any given starting year, on a per customer basis.

Finally, I get my result by adding the consecutive_years to the start_year, and presto, the final SELECT yields:

contact, consecutive_years, start_year, end_year
123, 3, 2012, 2015
123, 2, 2009, 2011
234, 2, 2013, 2015
234, 1, 2010, 2011

Note that it would be very easy to filter out for casesWHERE consecutive years >= 2 to ignore "irrelevant" results.

Now for the caveats: I wrote this series of CTEs for ease of readability and explanation. There may be a more computationally or programatically efficient way to achieve your results, but on even a moderately sized table this should get you your results pretty quick.