SQL Server – Retrieve Last Not-Null Value in Ordered Column

nullrunning-totalssql servert-sqlwindow functions

I have the following input:

 id | value 
----+-------
  1 |   136
  2 |  NULL
  3 |   650
  4 |  NULL
  5 |  NULL
  6 |  NULL
  7 |   954
  8 |  NULL
  9 |   104
 10 |  NULL

I expect the following result:

 id | value 
----+-------
  1 |   136
  2 |   136
  3 |   650
  4 |   650
  5 |   650
  6 |   650
  7 |   954
  8 |   954
  9 |   104
 10 |   104

The trivial solution would be join the tables with a < relation, and then selecting the MAX value in a GROUP BY:

WITH tmp AS (
  SELECT t2.id, MAX(t1.id) AS lastKnownId
  FROM t t1, t t2
  WHERE
    t1.value IS NOT NULL
    AND
    t2.id >= t1.id
  GROUP BY t2.id
)
SELECT
  tmp.id, t.value
FROM t, tmp
WHERE t.id = tmp.lastKnownId;

However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( O(n^2) ). I expected t-sql to optimize it out – on a block/record level, the task to do is very easy and linear, essentially a for loop ( O(n) ).

However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.

Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.

Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.

I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.

The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.

Best Answer

A common solution to this type of problem is given by Itzik Ben-Gan in his article The Last non NULL Puzzle:

DROP TABLE IF EXISTS dbo.Example;

CREATE TABLE dbo.Example
(
    id integer PRIMARY KEY,
    val integer NULL
);

INSERT dbo.Example
    (id, val)
VALUES
    (1, 136),
    (2, NULL),
    (3, 650),
    (4, NULL),
    (5, NULL),
    (6, NULL),
    (7, 954),
    (8, NULL),
    (9, 104),
    (10, NULL);

SELECT
    E.id,
    E.val,
    lastval =
        CAST(
            SUBSTRING(
                MAX(CAST(E.id AS binary(4)) + CAST(E.val AS binary(4))) OVER (
                    ORDER BY E.id
                    ROWS UNBOUNDED PRECEDING),
            5, 4)
        AS integer)
FROM dbo.Example AS E
ORDER BY
    E.id;

Demo: db<>fiddle