I have a request that wants to look at numbers of consecutive year buyers. For example, I have a table like this:
Order_year, Contact
2015, 123
2014, 123
2013, 123
2011, 123
2010, 123
2015, 234
2014, 234
2011, 234
I want to calculate for each year how many consecutive order years each contact_id
has. The problem is, there are gaps such as contact_id = 123
, it has 2010-2011, then 2013-2015.
The output would be:
report_year, contact, consecutive_years
2015, 123, 3
2014, 123, 2
2013, 123, 1
2012, 123, 0
2011, 123, 2
2010, 123, 1
2015, 234, 3
2014, 234, 2
2013, 234, 1
2012, 234, 0
2011, 234, 1
Or if you just want to calculate the max order year with the max consecutive years, that's fine, such as:
order_year, contact, consecutive_years
2015, 123, 3
2011, 123, 2
2015, 234, 2
Best Answer
So I've built a short SQL script to do this operation, based on the original sample input data:
The script is as follows:
Now, for a little explanation:
Recusive Query
First, you need to find all the occasions of consecutive years, so I built the
ranked_order
table(maybe a bad name... maybe should have called itcounted_order
?) to count instances where a given contact ordered in back-to-back years.Filter Results
So, due to the recursive query, I am left with all counted elements, rather than simply the maximum consecutive counts. So, I built
max_from_ranked_order
to extract the max counter value perorder_year,contact
grouping.A little arithmetic...
This still doesn't quite yield what we need. Since we grouped over
order_year,contact
, you still end up with entries likeWhat you'll notice is that you have unique, information which you can group on by observing the starting year of the consecutive years. So, I created
start_and_consecutive_years
to take the maximum number of consecutive purchasing years for any given starting year, on a per customer basis.Finally, I get my result by adding the
consecutive_years
to thestart_year
, and presto, the finalSELECT
yields:Note that it would be very easy to filter out for cases
WHERE consecutive years >= 2
to ignore "irrelevant" results.Now for the caveats: I wrote this series of CTEs for ease of readability and explanation. There may be a more computationally or programatically efficient way to achieve your results, but on even a moderately sized table this should get you your results pretty quick.