This question is similar to Running total with count?, but please allow me to explain some further twists on the issue.
I'm using SQL Server 2008, so the cursor option described by Aaron Bertrand seems to be the most promising in terms of speed.
What's different here, though, is I have to take into account two dates for a single item. So, given an OrderID item within a table of Orders, the OrderID has an Opened Date and a Closed Date. The Opened Date is always populated, but the Closed Date could be NULL
.
OrderID OpenedDate ClosedDate
654554 12/1/2011 5/4/2012
678451 12/4/2011 3/2/2012
679565 12/8/2011 5/21/2012
701541 5/23/2012 NULL
...
I need to – efficiently – get back how many Orders had what we could term an "Open" status on any given date within a date range. The date range could span a couple of years.
(Yes, I do have a reference table of sequential dates.)
Date CountOfOpenOrders
12/1/2011 175
12/2/2011 178
12/3/2011 195
12/4/2011 192
12/5/2011 191
...
Best Answer
If your priority is speed of selects, the following approach allows for extremely fast selects. Instead of storing a period, you want to store two events (period start and period end). Change column is 1 when the period begins, and is -1 when the period ends. If more than one event occurs on the same day, they must have different EventNumberPerDay. RunningTotal is the number of open peroids after the event has happened:
Also you need a calendar table:
Once that is accomplished, then your select is very simple and very fast:
Of course, this query is only correct if the data in Events table is valid. We can use constraints to ensure 100% data integrity. I can explain how if you are interested.
Another alternative is to just load your raw data, your periods, into a client application - your problem is absolutely trivial in C++/C#/Java.
Yet another approach is to use an RDBMS with fast cursors such as Oracle - that will allow you to just write a simple cursor and enjoy good performance, but still not always as good as my first solution.