Sql-server – Obtaining the current information from a table with historical entries

greatest-n-per-groupsql serversql server 2014t-sql

I have a table that stores historical data. I have particular needs regarding how to select this data, as outlined below. The Basic table and the way "current" data is obtained can be found at this rextester link.

A slice of the data:

KeyID   CitizenID RandomSystemID ParentID   Quality   Begin            End              Power
98554   40        0              0          Mediocre  2/19/2010 2:00   2/19/2010 2:00   3
98632   40        11             309        Special   10/5/2010 19:21  10/5/2010 19:21  4.2
98704   40        11             322        Special   10/5/2010 19:22  10/5/2010 19:22  5
99208   40        11             309        Special   2/15/2011 19:56  2/15/2011 19:56  3.9
99279   40        11             322        Special   2/15/2011 19:56  2/15/2011 196    6

This is the table CitizenPowerLevel. The Power value of a given CitizenID can shift up or down, based on User needs and whims. It separately adjusted set per ParentID. ParentID=0 indicates "use this if there is no other value to use for interacting with a particular Parent". This table stores a record for every time Power is changed. No records are deleted from this table.

We need a view that will give us the "current" power level for all CitizenIDs of type Special. The existing SQL for that View (found at the link) looks like this:

SELECT TOP 100 PERCENT 
  MAX(conv.[Power]) AS [Power]
, conv.CitizenID
, conv.ParentID
, 11 as ObjectTypeID
FROM    (SELECT 
      CitizenID 
    , ParentID
    , MAX([End]) AS ConversionEnd
    FROM CitizenPowerLevel conv
    WHERE (Quality = 'Special')
    AND RandomSystemID = 11
    GROUP BY CitizenID, ParentID) DERIVED 
INNER JOIN CitizenPowerLevel conv
    ON DERIVED.CitizenID = conv.CitizenID 
        AND DERIVED.ConversionEnd = conv.[End]
        AND DERIVED.ParentID = conv.ParentID
        AND 11 = conv.RandomSystemID
WHERE (conv.Quality = 'Special')
GROUP BY conv.CitizenID
    , conv.ParentID
    , conv.RandomSystemID
ORDER BY conv.CitizenID

And produces these results:

Power   CitizenID   ParentID 
4.8000  40          309
5.4000  40          322

Which is what we want to see from the records I listed in this post. The SQL strikes me as an unusual way to do this, but I'm no DBA (just a lowly dev). I am not responsible for this SQL, it has existed for probably 5-10 years. There is a special scenario – unlikely to occur – where a given CitzenID can get historical records with matching dates for both the Begin and End columns, because someone thought using fixed far out dates was a better way to represent data in a historical table for some reason.

The link data has additional rows, but these two illustrate my point:

(100000, 40, 11, 322, 'Special', '1900-01-01 00:00:00', '2050-12-31 00:00:00', 6.3),
(100001, 40, 11, 322, 'Special', '1900-01-01 00:00:00', '2050-12-31 00:00:00', 5.4),

I realize that when the above SQL is run at the link, CitizenID 40 is going to return 6.3 for Power level for ParentID 322 and there is no way to tell which one is correct, but that is not the problem I'm trying to solve right now. The design with the arbitrary dates cannot be changed at this time, however this is also one of the objections I have with the SQL.

My other objection is this SQL strikes me as one of those "Engineer has not looked at this in a year/ever, cannot determine easily what the hell is going on". But then as a lowly programmer, my SQL skills are only noteworthy relative to other programmers.

There is also business need to search through all CitizenIDs and then find out current power level regardless of Quality, this strikes me as a situation best met with a Greatest-N-Per-Group solution (where the group is CitizenID, Quality, ParentID, and then looking for the most recent value in a sane date field).

  1. Does anyone find the current SQL objectionable? If so, why?

  2. How would you solve the problem if you were doing it some other way (assuming you wouldn't solve it the way it was done here? This table never grows especially big (low 6 figures tops), and there is a LastModified date (which is sanely set at the time a record is entered, no matter how it gets entered).

Best Answer

I can get the similar results using a window function, ROW_NUMBER but I'm not sure I understand all your logic:

;WITH cte AS
(
SELECT 
    ROW_NUMBER() OVER( PARTITION BY a.CitizenID, a.ParentID, a.RandomSystemID ORDER BY [End] DESC ) rn,
    [Power],
    a.CitizenID,
    a.ParentID,
    a.RandomSystemID
FROM dbo.CitizenPowerLevel a
WHERE a.quality = 'Special'
    AND a.randomsystemid = 11
)
SELECT [Power], CitizenID, ParentID, RandomSystemID
FROM cte
WHERE rn = 1
ORDER BY CitizenID, ParentID, RandomSystemID

The two queries give different results on my scale-up tests so have a look and see what you think - possibly I've misunderstood something.

My results:

Results with window function