PostGIS – Create 7 Day Rolling Average for Each Feature

postgispostgresqlpostgresql-performancequery-performancewindow functions

I have a timeseries dataset that looks like this:

uid | geom     | date       | count |
-------------------------------------
1   | FeatureA | 2016-02-01 | 1     | 
2   | FeatureA | 2016-02-02 | 2     | 
3   | FeatureA | 2016-02-03 | 3     | 
4   | FeatureA | 2016-02-04 | 4     | 
5   | FeatureA | 2016-02-05 | 5     | 
6   | FeatureA | 2016-02-06 | 9     | 
7   | FeatureA | 2016-02-07 | 11    | 
8   | FeatureA | 2016-02-08 | 15    | 
9   | FeatureA | 2016-02-09 | 17    | 
10  | FeatureA | 2016-02-10 | 20    | 
11  | FeatureB | 2016-02-01 | 2     | 
12  | FeatureB | 2016-02-02 | 2     | 
13  | FeatureB | 2016-02-03 | 8     | 
14  | FeatureB | 2016-02-04 | 4     | 
15  | FeatureB | 2016-02-05 | 5     | 
16  | FeatureB | 2016-02-06 | 15    | 
17  | FeatureB | 2016-02-07 | 11    | 
18  | FeatureB | 2016-02-08 | 15    | 
19  | FeatureB | 2016-02-09 | 19    | 
20  | FeatureB | 2016-02-10 | 25    |

I would like to calculate the 7 day rolling average for each feature in the dataset (~2000 features). To calculate a 7 day rolling average in Postgres windows could be used as described here.

The following code almost works:

SELECT geom,date,count,  
       AVG(count)
            OVER(PARTITION BY geom ORDER BY geom, date ROWS BETWEEN CURRENT ROW AND 7 Following) AS rolling_avg_count
FROM features;

This gives the following output:

 geom    | date       | count | rolling_avg_count
--------------------------------------------------
FeatureA | 2016-02-01 | 1     | 6.25
FeatureA | 2016-02-02 | 2     | 8.25
FeatureA | 2016-02-03 | 3     | 10.5
FeatureA | 2016-02-04 | 4     | 11.57
FeatureA | 2016-02-05 | 5     | 12.83
FeatureA | 2016-02-06 | 9     | 14.4
FeatureA | 2016-02-07 | 11    | 15.75
FeatureA | 2016-02-08 | 15    | 17.33
FeatureA | 2016-02-09 | 17    | 18.5
FeatureA | 2016-02-10 | 20    | 20
FeatureB | 2016-02-01 | 2     | 7.75
FeatureB | 2016-02-02 | 2     | 9.875
FeatureB | 2016-02-03 | 8     | 12.75
FeatureB | 2016-02-04 | 4     | 13.43
FeatureB | 2016-02-05 | 5     | 15
FeatureB | 2016-02-06 | 15    | 17
FeatureB | 2016-02-07 | 11    | 17.5
FeatureB | 2016-02-08 | 15    | 19.67
FeatureB | 2016-02-10 | 25    | 25

However, the output continues calculating averages up to the end of the partition. For example, uid 10 would have a rolling average of 20 (calculated using only one record). I'd like the calculation to stop when there a fewer than 7 following rows.

Ideally the output would look something like this:

 geom    | date       | count | rolling_avg_count
--------------------------------------------------
FeatureA | 2016-02-01 | 1     | 6.25
FeatureA | 2016-02-02 | 2     | 8.25
FeatureA | 2016-02-03 | 3     | 10.5
FeatureA | 2016-02-04 | 4     | 11.57
FeatureA | 2016-02-05 | 5     | 
FeatureA | 2016-02-06 | 9     | 
FeatureA | 2016-02-07 | 11    | 
FeatureA | 2016-02-08 | 15    | 
FeatureA | 2016-02-09 | 17    | 
FeatureA | 2016-02-10 | 20    | 
FeatureB | 2016-02-01 | 2     | 7.75
FeatureB | 2016-02-02 | 2     | 9.875
FeatureB | 2016-02-03 | 8     | 12.75
FeatureB | 2016-02-04 | 4     | 13.43
FeatureB | 2016-02-05 | 5     | 
FeatureB | 2016-02-06 | 15    | 
FeatureB | 2016-02-07 | 11    | 
FeatureB | 2016-02-08 | 15    | 
FeatureB | 2016-02-10 | 25    |

Best Answer

This produces the desired result - after adjusting off-by-1 errors ②:

SELECT geom, date, count
     , CASE WHEN rn < 7 THEN NULL  -- ②
            ELSE round(rolling_avg_count, 2) END AS rolling_avg_count
FROM  (
   SELECT geom, date, count
        , AVG(count) OVER (PARTITION BY geom
                           ORDER BY date -- ①
                           ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING -- ②
                          ) AS rolling_avg_count
        , row_number() OVER (PARTITION BY geom ORDER BY date DESC) AS rn
   FROM   features
   ) sub;

db<>fiddle here

① With PARTITION BY geom, you don't need geom in ORDER BY.

② You had off-by-one error(s):

I would like to calculate the 7 day rolling average

But you show a calculation for the 8 day rolling average (1 + 7).

I'd like the calculation to stop when there a fewer than 7 following rows.

But you show a result for only 6 following rows (displayed values 11.57 and 13.43).

I made it an actual 7 day average, stopping when there are fewer than 6 following rows.

Performance optimization

The above adds an additional sort to the query plan, due to the descending sort order in the second window function.

This alternative query avoids that with an additional count():

SELECT geom, date, count
     , CASE WHEN rn < 6 THEN NULL -- ③
            ELSE round(rolling_avg_count, 2) END AS rolling_avg_count
FROM  (
   SELECT geom, date, count
        , avg(count)   OVER (PARTITION BY geom
                             ORDER BY date
                             ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING
                            ) AS rolling_avg_count
        , count(*)     OVER (PARTITION BY geom)
        - row_number() OVER (PARTITION BY geom ORDER BY date) AS rn
   FROM   features
   ) sub;

db<>fiddle here

A bit more verbose, adding an additional window function. But I expect it to perform better.

③ Adjusted to 0-based row number. (Cheaper than adding + 1.)

The query you have

You could simplify your query using a WINDOW clause, but that's just shortening the syntax, not changing the query plan.

SELECT id, trans_ref_no, amount, trans_date, entity_id
     , SUM(amount) OVER w AS trans_total
     , COUNT(*)    OVER w AS trans_count
FROM   transactiondb
WINDOW w AS (PARTITION BY entity_id, date_trunc('month',trans_date)
             ORDER BY trans_date
             ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);

Also using the slightly faster count(*), since id is certainly defined NOT NULL?
And you don't need to ORDER BY entity_id since you already PARTITION BY entity_id

You can simplify further, though:
Don't add ORDER BY to the window definition at all, it's not relevant to your query. Then you don't need to define a custom window frame, either:

SELECT id, trans_ref_no, amount, trans_date, entity_id
     , SUM(amount) OVER w AS trans_total
     , COUNT(*)    OVER w AS trans_count
FROM   transactiondb
WINDOW w AS (PARTITION BY entity_id, date_trunc('month',trans_date);

Simpler, faster, but still just a better version of what you have, with static months.

The query you might want

... is not clearly defined, so I'll build on these assumptions:

Count transactions and amount for every 30-day period within the first and last transaction of any entity_id. Exclude leading and trailing periods without activity, but include all possible 30-day periods within those outer bounds.

SELECT entity_id, trans_date
     , COALESCE(sum(daily_amount) OVER w, 0) AS trans_total
     , COALESCE(sum(daily_count)  OVER w, 0) AS trans_count
FROM  (
   SELECT entity_id
        , generate_series (min(trans_date)::timestamp
                         , GREATEST(min(trans_date), max(trans_date) - 29)::timestamp
                         , interval '1 day')::date AS trans_date
   FROM   transactiondb 
   GROUP  BY 1
   ) x
LEFT JOIN (
   SELECT entity_id, trans_date
        , sum(amount) AS daily_amount, count(*) AS daily_count
   FROM   transactiondb
   GROUP  BY 1, 2
   ) t USING (entity_id, trans_date)
WINDOW w AS (PARTITION BY entity_id ORDER BY trans_date
             ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING);

This lists all 30-day periods for each entity_id with your aggregates and with trans_date being the first day (incl.) of the period. To get values for each individual row join to the base table once more ...

The basic difficulty is the same as discussed here:

Referencing current row in FILTER clause of window function

The frame definition of a window cannot depend on values of the current row.

And rather call generate_series() with timestamp input:

Generating time series between two dates in PostgreSQL

The query you actually want

After question update and discussion:
Accumulate rows of the same entity_id in a 30-day window starting at each actual transaction.

Since your data is distributed sparsely, it should be more efficient to run a self-join with a range condition, all the more since Postgres 9.1 does not have LATERAL joins, yet:

SELECT t0.id, t0.amount, t0.trans_date, t0.entity_id
     , sum(t1.amount) AS trans_total, count(*) AS trans_count
FROM   transactiondb t0
JOIN   transactiondb t1 USING (entity_id)
WHERE  t1.trans_date >= t0.trans_date
AND    t1.trans_date <  t0.trans_date + 30  -- exclude upper bound
-- AND    t0.entity_id = 114284  -- or pick a single entity ...
GROUP  BY t0.id  -- is PK!
ORDER  BY t0.trans_date, t0.id

SQL Fiddle.

A rolling window could only make sense (with respect to performance) with data for most days.

This does not aggregate duplicates on (trans_date, entity_id) per day, but all rows of the same day are always included in the 30-day window.

For a big table, a covering index like this could help quite a bit:

CREATE INDEX transactiondb_foo_idx
ON transactiondb (entity_id, trans_date, amount);

The last column amount is only useful if you get index-only scans out of it. Else drop it.

But it's not going to be used while you select the whole table anyway. It would support queries for a small subset.

SQL Server – Date Range Rolling Sum Using Window Functions

Great question, Paul! I used a couple different approaches, one in T-SQL and one in CLR.

T-SQL quick summary

The T-SQL approach can be summarized as the following steps:

Take the cross-product of products/dates
Merge in the observed sales data
Aggregate that data to the product/date level
Compute rolling sums over the past 45 days based on this aggregate data (which contains any "missing" days filled in)
Filter those results to only the product/date pairings that had one or more sales

Using SET STATISTICS IO ON, this approach reports Table 'TransactionHistory'. Scan count 1, logical reads 484, which confirms the "single pass" over the table. For reference, the original loop-seek query reports Table 'TransactionHistory'. Scan count 113444, logical reads 438366.

As reported by SET STATISTICS TIME ON, the CPU time is 514ms. This compares favorably to 2231ms for the original query.

CLR quick summary

The CLR summary can be summarized as the following steps:

Read the data into memory, ordered by product and date
While processing each transaction, add to a running total of the costs. Whenever a transaction is a different product than the previous transaction, reset the running total to 0.
Maintain a pointer to the first transaction that has the same (product, date) as the current transaction. Whenever the last transaction with that (product, date) is encountered, compute the rolling sum for that transaction and apply it to all transactions with the same (product, date)
Return all of the results to the user!

Using SET STATISTICS IO ON, this approach reports that no logical I/O has occurred! Wow, a perfect solution! (Actually, it seems that SET STATISTICS IO does not report I/O incurred within CLR. But from the code, it is easy to see that exactly one scan of the table is made and retrieves the data in order by the index Paul suggested.

As reported by SET STATISTICS TIME ON, the CPU time is now 187ms. So this is quite an improvement over the T-SQL approach. Unfortunately, the overall elapsed time of both approaches is very similar at about half a second each. However, the CLR based approach does have to output 113K rows to the console (vs. just 52K for the T-SQL approach that groups by product/date), so that's why I've focused on CPU time instead.

Another big advantage of this approach is that it yields exactly the same results as the original loop/seek approach, including a row for every transaction even in cases where a product is sold multiple times on the same day. (On AdventureWorks, I specifically compared row-by-row results and confirmed that they tie out with Paul's original query.)

A disadvantage of this approach, at least in its current form, is that it reads all data in memory. However, the algorithm that has been designed only strictly needs the current window frame in memory at any given time and could be updated to work for data sets that exceed memory. Paul has illustrated this point in his answer by producing an implementation of this algorithm that stores only the sliding window in memory. This comes at the expense of granting higher permissions to CLR assembly, but would definitely be worthwhile in scaling this solution up to arbitrarily large data sets.

T-SQL - one scan, grouped by date

Initial setup

USE AdventureWorks2012
GO
-- Create Paul's index
CREATE UNIQUE INDEX i
ON Production.TransactionHistory (ProductID, TransactionDate, ReferenceOrderID)
INCLUDE (ActualCost);
GO
-- Build calendar table for 2000 ~ 2020
CREATE TABLE dbo.calendar (d DATETIME NOT NULL CONSTRAINT PK_calendar PRIMARY KEY)
GO
DECLARE @d DATETIME = '1/1/2000'
WHILE (@d < '1/1/2021')
BEGIN
    INSERT INTO dbo.calendar (d) VALUES (@d)
    SELECT @d =  DATEADD(DAY, 1, @d)
END
GO

The query

DECLARE @minAnalysisDate DATE = '2007-09-01', -- Customizable start date depending on business needs
        @maxAnalysisDate DATE = '2008-09-03'  -- Customizable end date depending on business needs
SELECT ProductID, TransactionDate, ActualCost, RollingSum45, NumOrders
FROM (
    SELECT ProductID, TransactionDate, NumOrders, ActualCost,
        SUM(ActualCost) OVER (
                PARTITION BY ProductId ORDER BY TransactionDate 
                ROWS BETWEEN 45 PRECEDING AND CURRENT ROW
            ) AS RollingSum45
    FROM (
        -- The full cross-product of products and dates, combined with actual cost information for that product/date
        SELECT p.ProductID, c.d AS TransactionDate,
            COUNT(TH.ProductId) AS NumOrders, SUM(TH.ActualCost) AS ActualCost
        FROM Production.Product p
        JOIN dbo.calendar c
            ON c.d BETWEEN @minAnalysisDate AND @maxAnalysisDate
        LEFT OUTER JOIN Production.TransactionHistory TH
            ON TH.ProductId = p.productId
            AND TH.TransactionDate = c.d
        GROUP BY P.ProductID, c.d
    ) aggsByDay
) rollingSums
WHERE NumOrders > 0
ORDER BY ProductID, TransactionDate
-- MAXDOP 1 to avoid parallel scan inflating the scan count
OPTION (MAXDOP 1)

The execution plan

From the execution plan, we see that the original index proposed by Paul is sufficient to allow us to perform a single ordered scan of Production.TransactionHistory, using a merge join to combine the transaction history with each possible product/date combination.

Assumptions

There are a few significant assumptions baked into this approach. I suppose it will be up to Paul to decide whether they are acceptable :)

I am using the Production.Product table. This table is freely available on AdventureWorks2012 and the relationship is enforced by a foreign key from Production.TransactionHistory, so I interpreted this as fair game.
This approach relies on the fact that transactions do not have a time component on AdventureWorks2012; if they did, generating the full set of product/date combinations would no longer be possible without first taking a pass over the transaction history.
I am producing a rowset that contains just one row per product/date pair. I think that this is "arguably correct" and in many cases a more desirable result to return. For each product/date, I have added a NumOrders column to indicate how many sales occurred. See the following screenshot for a comparison of the results of the original query vs. the proposed query in cases where a product was sold multiple times on the same date (e.g., 319 / 2007-09-05 00:00:00.000)

CLR - one scan, full ungrouped result set

The main function body

There isn't a ton to see here; the main body of the function declares the inputs (which must match the corresponding SQL function), sets up a SQL connection, and opens the SQLReader.

// SQL CLR function for rolling SUMs on AdventureWorks2012.Production.TransactionHistory
[SqlFunction(DataAccess = DataAccessKind.Read,
    FillRowMethodName = "RollingSum_Fill",
    TableDefinition = "ProductId INT, TransactionDate DATETIME, ReferenceOrderID INT," +
                      "ActualCost FLOAT, PrevCumulativeSum FLOAT, RollingSum FLOAT")]
public static IEnumerable RollingSumTvf(SqlInt32 rollingPeriodDays) {
    using (var connection = new SqlConnection("context connection=true;")) {
        connection.Open();
        List<TrxnRollingSum> trxns;
        using (var cmd = connection.CreateCommand()) {
            //Read the transaction history (note: the order is important!)
            cmd.CommandText = @"SELECT ProductId, TransactionDate, ReferenceOrderID,
                                    CAST(ActualCost AS FLOAT) AS ActualCost 
                                FROM Production.TransactionHistory 
                                ORDER BY ProductId, TransactionDate";
            using (var reader = cmd.ExecuteReader()) {
                trxns = ComputeRollingSums(reader, rollingPeriodDays.Value);
            }
        }

        return trxns;
    }
}

The core logic

I've separated out the main logic so that's easier to focus on:

// Given a SqlReader with transaction history data, computes / returns the rolling sums
private static List<TrxnRollingSum> ComputeRollingSums(SqlDataReader reader,
                                                        int rollingPeriodDays) {
    var startIndexOfRollingPeriod = 0;
    var rollingSumIndex = 0;
    var trxns = new List<TrxnRollingSum>();

    // Prior to the loop, initialize "next" to be the first transaction
    var nextTrxn = GetNextTrxn(reader, null);
    while (nextTrxn != null)
    {
        var currTrxn = nextTrxn;
        nextTrxn = GetNextTrxn(reader, currTrxn);
        trxns.Add(currTrxn);

        // If the next transaction is not the same product/date as the current
        // transaction, we can finalize the rolling sum for the current transaction
        // and all previous transactions for the same product/date
        var finalizeRollingSum = nextTrxn == null || (nextTrxn != null &&
                                (currTrxn.ProductId != nextTrxn.ProductId ||
                                currTrxn.TransactionDate != nextTrxn.TransactionDate));
        if (finalizeRollingSum)
        {
            // Advance the pointer to the first transaction (for the same product)
            // that occurs within the rolling period
            while (startIndexOfRollingPeriod < trxns.Count
                && trxns[startIndexOfRollingPeriod].TransactionDate <
                    currTrxn.TransactionDate.AddDays(-1 * rollingPeriodDays))
            {
                startIndexOfRollingPeriod++;
            }

            // Compute the rolling sum as the cumulative sum (for this product),
            // minus the cumulative sum for prior to the beginning of the rolling window
            var sumPriorToWindow = trxns[startIndexOfRollingPeriod].PrevSum;
            var rollingSum = currTrxn.ActualCost + currTrxn.PrevSum - sumPriorToWindow;
            // Fill in the rolling sum for all transactions sharing this product/date
            while (rollingSumIndex < trxns.Count)
            {
                trxns[rollingSumIndex++].RollingSum = rollingSum;
            }
        }

        // If this is the last transaction for this product, reset the rolling period
        if (nextTrxn != null && currTrxn.ProductId != nextTrxn.ProductId)
        {
            startIndexOfRollingPeriod = trxns.Count;
        }
    }

    return trxns;
}

Helpers

The following logic could be written inline, but it's a little easier to read when they are split out into their own methods.

private static TrxnRollingSum GetNextTrxn(SqlDataReader r, TrxnRollingSum currTrxn) {
    TrxnRollingSum nextTrxn = null;
    if (r.Read()) {
        nextTrxn = new TrxnRollingSum {
            ProductId = r.GetInt32(0),
            TransactionDate = r.GetDateTime(1),
            ReferenceOrderId = r.GetInt32(2),
            ActualCost = r.GetDouble(3),
            PrevSum = 0 };
        if (currTrxn != null) {
            nextTrxn.PrevSum = (nextTrxn.ProductId == currTrxn.ProductId)
                    ? currTrxn.PrevSum + currTrxn.ActualCost : 0;
        }
    }
    return nextTrxn;
}

// Represents the output to be returned
// Note that the ReferenceOrderId/PrevSum fields are for debugging only
private class TrxnRollingSum {
    public int ProductId { get; set; }
    public DateTime TransactionDate { get; set; }
    public int ReferenceOrderId { get; set; }
    public double ActualCost { get; set; }
    public double PrevSum { get; set; }
    public double RollingSum { get; set; }
}

// The function that generates the result data for each row
// (Such a function is mandatory for SQL CLR table-valued functions)
public static void RollingSum_Fill(object trxnWithRollingSumObj,
                                    out int productId,
                                    out DateTime transactionDate, 
                                    out int referenceOrderId, out double actualCost,
                                    out double prevCumulativeSum,
                                    out double rollingSum) {
    var trxn = (TrxnRollingSum)trxnWithRollingSumObj;
    productId = trxn.ProductId;
    transactionDate = trxn.TransactionDate;
    referenceOrderId = trxn.ReferenceOrderId;
    actualCost = trxn.ActualCost;
    prevCumulativeSum = trxn.PrevSum;
    rollingSum = trxn.RollingSum;
}

Tying it all together in SQL

Everything up to this point has been in C#, so let's see the actual SQL involved. (Alternatively, you can use this deployment script to create the assembly directly from the bits of my assembly rather than compiling yourself.)

USE AdventureWorks2012; /* GPATTERSON2\SQL2014DEVELOPER */
GO

-- Enable CLR
EXEC sp_configure 'clr enabled', 1;
GO
RECONFIGURE;
GO

-- Create the assembly based on the dll generated by compiling the CLR project
-- I've also included the "assembly bits" version that can be run without compiling
CREATE ASSEMBLY ClrPlayground
-- See http://pastebin.com/dfbv1w3z for a "from assembly bits" version
FROM 'C:\FullPathGoesHere\ClrPlayground\bin\Debug\ClrPlayground.dll'
WITH PERMISSION_SET = safe;
GO

--Create a function from the assembly
CREATE FUNCTION dbo.RollingSumTvf (@rollingPeriodDays INT)
RETURNS TABLE ( ProductId INT, TransactionDate DATETIME, ReferenceOrderID INT,
                ActualCost FLOAT, PrevCumulativeSum FLOAT, RollingSum FLOAT)
-- The function yields rows in order, so let SQL Server know to avoid an extra sort
ORDER (ProductID, TransactionDate, ReferenceOrderID)
AS EXTERNAL NAME ClrPlayground.UserDefinedFunctions.RollingSumTvf;
GO

-- Now we can actually use the TVF!
SELECT * 
FROM dbo.RollingSumTvf(45) 
ORDER BY ProductId, TransactionDate, ReferenceOrderId
GO

Caveats

The CLR approach provides a lot more flexibility to optimize the algorithm, and it could probably be tuned even further by an expert in C#. However, there are also downsides to the CLR strategy. A few things to keep in mind:

This CLR approach keeps a copy of the data set in memory. It is possible to use a streaming approach, but I encountered initial difficulties and found that there is an outstanding Connect issue complaining that changes in SQL 2008+ make it more difficult to use this type of approach. It's still possible (as Paul demonstrates), but requires a higher level of permissions by setting the database as TRUSTWORTHY and granting EXTERNAL_ACCESS to the CLR assembly. So there is some hassle and potential security implication, but the payoff is a streaming approach that can better scale to much larger data sets than those on AdventureWorks.
CLR may be less accessible to some DBAs, making such a function more of a black box that is not as transparent, not as easily modified, not as easily deployed, and perhaps not as easily debugged. This is a pretty big disadvantage when compared to a T-SQL approach.

Bonus: T-SQL #2 - the practical approach I'd actually use

After trying to think about the problem creatively for a while, I thought I'd also post the fairly simple, practical way that I would likely choose to tackle this problem if it came up in my daily work. It does make use of SQL 2012+ window functionality, but not in type of groundbreaking way that the question was hoping for:

-- Compute all running costs into a #temp table; Note that this query could simply read
-- from Production.TransactionHistory, but a CROSS APPLY by product allows the window 
-- function to be computed independently per product, supporting a parallel query plan
SELECT t.*
INTO #runningCosts
FROM Production.Product p
CROSS APPLY (
    SELECT t.ProductId, t.TransactionDate, t.ReferenceOrderId, t.ActualCost,
        -- Running sum of the cost for this product, including all ties on TransactionDate
        SUM(t.ActualCost) OVER (
            ORDER BY t.TransactionDate 
            RANGE UNBOUNDED PRECEDING) AS RunningCost
    FROM Production.TransactionHistory t
    WHERE t.ProductId = p.ProductId
) t
GO

-- Key the table in our output order
ALTER TABLE #runningCosts
ADD PRIMARY KEY (ProductId, TransactionDate, ReferenceOrderId)
GO

SELECT r.ProductId, r.TransactionDate, r.ReferenceOrderId, r.ActualCost,
    -- Cumulative running cost - running cost prior to the sliding window
    r.RunningCost - ISNULL(w.RunningCost,0) AS RollingSum45
FROM #runningCosts r
OUTER APPLY (
    -- For each transaction, find the running cost just before the sliding window begins
    SELECT TOP 1 b.RunningCost
    FROM #runningCosts b
    WHERE b.ProductId = r.ProductId
        AND b.TransactionDate < DATEADD(DAY, -45, r.TransactionDate)
    ORDER BY b.TransactionDate DESC
) w
ORDER BY r.ProductId, r.TransactionDate, r.ReferenceOrderId
GO

This actually yields a fairly simple overall query plan, even when looking at the both of the two relevant query plans together:

A few reasons I like this approach:

It yields the full result set requested in the problem statement (as opposed to most other T-SQL solutions, which return a grouped version of the results).
It is easy to explain, understand, and debug; I won't come back a year later and wonder how the heck I can make a small change without ruining the correctness or performance
It runs in about 900ms on the provided data set, rather than the 2700ms of the original loop-seek
If the data were much denser (more transactions per day), the computational complexity does not grow quadratically with the number of transactions in the sliding window (as it does for the original query); I think this addresses part of Paul's concern about wanted to avoid multiple scans
It results in essentially no tempdb I/O in recent updates of SQL 2012+ due to new tempdb lazy write functionality
For very large data sets, it is trivial to split the work into separate batches for each product if memory pressure were to become a concern

A couple potential caveats:

While it technically does scan Production.TransactionHistory just once, it's not truly a "one scan" approach because the #temp table of similar size and will need to perform additional logicion I/O on that table as well. However, I don't see this as too different from a work table that we have more manual control over since we have defined its precise structure
Depending on your environment, the usage of tempdb could be viewed as a positive (e.g., it's on a separate set of SSD drives) or a negative (high concurrency on the server, lots of tempdb contention already)