Sql-server – How to create a view with added rows for missing dates

slowly-changing-dimensionsql serversql-server-2012

I'm not sure how to accurately phrase this question.

Essentially, say I have a bunch of salespeople on different sales teams. Like, the Blue Region, the Orange Region, the Red Region, whatever.

Now, these sales people are each connected to one sales region, but may, on occasion, change sales regions over time.

Say Bob Jones was in the Blue Region for January 2015, February 2015, March 2015, but on April 2015, he went to the Green Region.

So I'm trying to create a report that would aggregate sales revenue based on Region (even though at the technical level, its logged to sales person).

So we would have transaction (table: employee_sales) data as follows:

name           date                 sales
Bob Jones      February 18th, 2015   $50,000

etc. etc.

So in order to aggregate by region, I would need a helper dimension table as follows (Remember, region can change over time) (table: employee_region)

name         region      month      year
Bob Jones    Blue        January    2015
Bob Jones    Blue        February   2015
Bob Jones    Blue        March      2015
Bob Jones    Green       April      2015

Then I can simply do a query:

select er.region, er.month, er.year, sum(es.sales)
from employee_sales es
inner join employee_region er
on es.name = er.name and month(es.date) = er.month and year(es.date) = er.year
group by er.region, er.month, er.year

So this would give me the data I need.

HOWEVER, now I have a problem — say I have 500 employees and the REGION data is updated a month in arrears. So right now, February 2016, we only have region data from January 2016.

Would I would LIKE to do, is … for the current month (February 2016) … if the data in the "helper table" for region is missing for the month, take the last month found (which may be January, but sometimes even December or November potentially).

I'm not sure what to do here. Create some kind of view?

Do I restructure the 'helper table' so it's more of an inequality statement?

EDIT: I think this is a slowly changing dimensions problem. Hmm I probably have to reorganize the dimension table.

Best Answer

I wouldn't organize your "helper table" that way. I would keep only {Region, Salesman, StartDate}. To create a view of intervals, join that table to itself to create data ranges where a.StartDate < b.StartDate. Use an outer join, so that NULL represents an EndDate of "now", which you can coalesce with the current date or other, as appropriate. To represent departures (not reassignments) use an artificial region named CiaoBaby or some such.

Now your maintenance chore is much easier; you need only capture staffing changes. The table is oodles smaller (over time, 100 or 1000 oodles). And you automatically have the built-in assumption that, absent contrary information, a salesman remains attached today to the same one as last month.