SQL Server – Running Ratio of Unique Counts to Total Counts

running-totalssql server

Any ideas on how to code a running ratio of counts (unique and total) for ordered observations (rows) in t-SQL (SQL Server 2017 Community Edition or higher)?

Given a data set, which is necessarily ordered (Ex. a time series indexed/ordered by date):

CREATE TABLE #x (dt DATE, Name VARCHAR(MAX));
INSERT INTO #x VALUES
('2012-01-01', 'a'),('2012-01-02', 'b'),('2012-01-03', 'c'),
('2012-01-04', 'a'),('2012-01-05', 'b'),('2012-01-06', 'd'),
('2012-01-07', 'a'),('2012-01-08', 'b'),('2012-01-09', 'e'),('2012-01-10', 'e');

So, given columns Dt and Names, I need the output Ratio below. For clarity, I also included output columns UCnt that counts unique names to date (i.e. a restricted running count) and TCnt counts all names to date (i.e. a simple running count). The output Ratio field is a just a running proportion of the two.

Dt          Name    UCnt    TCnt    Ratio
2012-01-01  a       1       1       1.000
2012-01-02  b       2       2       1.000
2012-01-03  c       3       3       1.000
2012-01-04  a       3       4       0.750
2012-01-05  b       3       5       0.600
2012-01-06  d       4       6       0.666
2012-01-07  a       4       7       0.571
2012-01-08  b       4       8       0.500
2012-01-09  e       5       9       0.555
2012-01-10  e       5       10      0.500

Best Answer

A combination of OUTER APPLY and ROW_NUMBER seems to do the trick:

WITH CTE AS
(
    SELECT  *,
            Tot_Cnt = ROW_NUMBER() OVER(ORDER BY dt)
    FROM #x
)
SELECT  A.dt,
        A.[Name],
        B.Unq_Cnt,
        A.Tot_Cnt,
        Ratio_of_Cnts = CONVERT(NUMERIC(10,4),B.Unq_Cnt)/A.Tot_Cnt 
FROM CTE A
OUTER APPLY (SELECT Unq_Cnt = COUNT(DISTINCT [Name]) 
             FROM CTE
             WHERE dt <= A.dt) B
;

Here is a demo of this.