Count of null rows preceding each non-null value

gaps-and-islandspartitioningwindow functions

I need to answer the following question:

What is the average number of email sends it take before a customer opens an email?
I know I need to count the number of rows between each Open_Date in order to get the number of email sends in between each open. I realize I need to add the row_number () function for that, but I can't seem to get to the desired column in my second screenshot.

Something to keep in mind, I will need to get one average "CounttilOpens" per email address. In my example below, I have two different email addresses and their respective Send and Open dates and I need the query to function indepdently for each email address.

My data is structured as follows:

enter image description here

I need a query that help me get me a new column like this that restarts for each email address and for each gap of open dates:

enter image description here

Best Answer

This is a type of gaps-and-islands problem. There are many different solutions.

  • Since COUNT(SomeValue) will only count non-null values, you can use a windowed count to calculate a grouping ID for each island.

  • We subtract 1 for each row which is not null, in order to keep it as part of the previous group

  • Then we use another windowed COUNT to get the final result, this time partitioning by the group ID also.

SELECT
  Email,
  Send_Date,
  Open_Date,
  CASE WHEN Open_Date IS NOT NULL THEN
      COUNT(*) OVER (PARTITION BY Email, GroupId ORDER BY Send_Date)
    END AS CountTilOpen
FROM (
    SELECT *,
        COUNT(Open_Date) OVER (PARTITION BY Email ORDER BY Send_Date)
          - CASE WHEN Open_Date IS NULL THEN 0 ELSE 1 END
          AS GroupId
    FROM YourTable t
) t;