Sql-server – how to select top ID from join operation and staging tables

sql serversql-server-2008-r2

I have this inner join from staging and operations table jobsCode I want to select the top surrogate key ID and get the Start_Date from this join my problem is my Query rerunning 2 records enter image description here
original query

SELECT a.JobCode 
    ,b.ID
    ,b.Start_Date   
    ,a.EndDate  
    ,b.Start_Date  
    ,b.End_Date
FROM [RPStage].[dbo].[GLBJobCode] a 
    INNER JOIN [operations].[dbo].[Job_Code] b ON  a.JobCode =  b.Job_Code
        AND a.EndDate = b.End_Date
        AND a.EndDate = '9998-12-31'
        AND a.StartDate != b.Start_Date;

trying to get it using max function

SELECT a.JobCode 
    ,MAX(b.ID) ID
    ,MAX(B.Start_Date) Start_Date  
    ,a.EndDate  
    ,b.Start_Date  
    ,b.End_Date
FROM [Stage].[dbo].[JobCode] a 
    INNER JOIN [operations].[dbo].[Job_Code] b ON  a.JobCode =  b.Job_Code
        AND a.EndDate = b.End_Date
        AND a.EndDate = '9998-12-31'
        AND a.StartDate != b.Start_Date
GROUP BY a.JobCode
    ,b.ID
    ,a.EndDate
    ,b.Start_Date
    ,b.End_Date;

Best Answer

Window Functions, or the OVER() Clause

It appears that what you are looking for is a window function or an OVER() clause.

In your original example, you are trying to use two max() conditions, which doesn't work, because when you try to then apply GROUP BY, you can have a condition where the max of the first column ID and the second column startdate aren't in the same row, and so then GROUP BY simply can't be understood.

However, if you are looking to extract the max over a grouping, and you can define that grouping, and you want to obtain the 'column-wise' maximum for more than one column for that grouping (as you seem to want to do), then here's the solution.

CREATE TABLE test (
  jobcode integer,
  id integer, 
  a_startdate datetime,
  a_enddate datetime,
  b_startdate datetime,
  b_enddate datetime);

  INSERT INTO test VALUES (513801, 7136, '11-01-2011', '12-31-9998', '11-01-2011', '12-31-9998');
  INSERT INTO test VALUES (513801, 7137,'04-26-2014', '12-31-9998','04-26-2014', '12-31-9998');

I first made this table to recreate your input data in the picture. I then applied this query.

SELECT jobcode, max(id) OVER (PARTITION BY jobcode), 
max(a_startdate) OVER(PARTITION BY jobcode), 
a_enddate,  b_startdate, b_enddate FROM test;

Try out this SQL Fiddle, and see if it gives the results you're looking for. I did the best I could with the description I had.

Implied Index Keys

Your index is on id DESC, timeSampled DESC. In SQL Server 2008 and later, partitioning introduces an extra implied leading key on $partition ASC (it is always ascending, it is not configurable) making the full index key $partition ASC, id DESC, timeSampled DESC.

Since id and timeSampled increase together (though there is nothing in the schema to guarantee this) you could rewrite the query as TOP (1) ... ORDER BY $partition DESC, id DESC. Unfortunately, the DESC keys on your index and ASC implied leading key $partition means the index could not be used to scan just one row from the index in order.

If your index keys were instead id ASC, timeSampled ASC the whole index key would be $partition ASC, id ASC, timeSampled ASC. This all-ASC index could be scanned backward, returning just the first row in key order. This row would be guaranteed to have the highest id value in the highest-numbered partition. Given the (unenforced) relationship between id and partition id, this would produce the correct result with an optimal execution plan that reads just a single row.

This 'solution' lacks integrity because the id-timeSampled relationship is not enforced, and you probably do not want to rebuild the nonclustered primary key anyway. Nevertheless, I mention it because it may enhance your understanding of how partitioning interacts with indexes.

Sql-server – How to set up indexed view when SELECTing TOP 1 with ORDER BY from different tables

It seems to ignore any index I put on it

Unless you're using SQL Server Enterprise Edition (or equivalently, Trial and Developer), you will need to use WITH (NOEXPAND) on the view reference in order to use it. In fact, even if you are using Enterprise, there are good reasons to use that hint.

Without the hint, the query optimizer (in Enterprise Edition) may make a cost-based choice between using the materialized view or accessing the base tables. Where the view is as large as the base tables, this calculation may favour the base tables.

Another point of interest is that without a NOEXPAND hint, view references are always expanded to the base query before optimization begins. As optimization progresses, the optimizer may or may not be able to match the expanded definition back to the materialized view, depending on previous optimization activity. This is almost certainly not the case with your simple query, but I mention it for completeness.

So, using the NOEXPAND table hint is your main option, but you might also think about just materializing the base table keys and the columns needed for ordering in the view. Create a unique clustered index on the combined key columns, then a separate nonclustered index on the ordering columns.

This will reduce the size of the materialized view, and limit the number of automatic updates that must be made to keep the view synchronized with the base tables. Your query can then be written to fetch the top 1 keys in the required order from the view (ideally with NOEXPAND), then join back to the base tables to fetch any remaining columns using the keys from the view.

Another variation is to cluster the view on the ordering columns and table keys, then write the query to manually fetch the non-view columns from the base table using the keys. The best option for you depends on the broader context. A good way to decide is to test it with the real data and workload.

Basic solution

CREATE VIEW VI_test
WITH SCHEMABINDING
AS
    SELECT
        t1.PK_ID1,
        t1.something1,
        t1.somethingelse1,
        t2.PK_ID2,
        t2.FK_ID1,
        t2.something2,
        t2.somethingelse2
    FROM dbo.TB_test1 t1
    INNER JOIN dbo.TB_test2 t2 
        ON t1.PK_ID1 = t2.FK_ID1;
GO
-- Brute force unique clustered index
CREATE UNIQUE CLUSTERED INDEX cuq 
ON dbo.VI_test 
    (somethingelse1, somethingelse2, PK_ID1, PK_ID2);
GO
SELECT TOP (1) * 
FROM dbo.VI_test WITH (NOEXPAND)
ORDER BY somethingelse1,somethingelse2;

Execution plan:

Using a nonclustered index

-- Minimal unique clustered index
CREATE UNIQUE CLUSTERED INDEX cuq 
ON dbo.VI_test 
    (PK_ID1, PK_ID2)
WITH (DROP_EXISTING = ON);
GO
-- Nonclustered index for ordering
CREATE NONCLUSTERED INDEX ix 
ON dbo.VI_test (somethingelse1, somethingelse2);

Execution plan:

There is a lookup in this plan, but it is only used to fetch a single row.

Minimal Indexed View

ALTER VIEW VI_test
WITH SCHEMABINDING
AS
    SELECT
        t1.PK_ID1,
        t2.PK_ID2,
        t1.somethingelse1,
        t2.somethingelse2
    FROM dbo.TB_test1 t1
    INNER JOIN dbo.TB_test2 t2 
        ON t1.PK_ID1 = t2.FK_ID1;
GO
-- Unique clustered index
CREATE UNIQUE CLUSTERED INDEX cuq 
ON dbo.VI_test 
    (somethingelse1, somethingelse2, PK_ID1, PK_ID2);

Query:

SELECT TOP (1)
    V.PK_ID1,
    TT1.something1,
    V.somethingelse1,
    V.PK_ID2,
    TT2.FK_ID1,
    TT2.something2,
    V.somethingelse2
FROM dbo.VI_test AS V WITH (NOEXPAND)
JOIN dbo.TB_test1 AS TT1 ON TT1.PK_ID1 = V.PK_ID1
JOIN dbo.TB_test2 AS TT2 ON TT2.PK_ID2 = V.PK_ID2
ORDER BY somethingelse1,somethingelse2;

Execution plan:

This shows the table keys being retrieved (a single row fetch from the view clustered index in order) followed by two single-row lookups on the base tables to fetch the remaining columns.

Best Answer

Window Functions, or the OVER() Clause

Related Solutions

Sql-server – Why does selecting top 1 from composite index DESC also used to partition by month not select the top value

Implied Index Keys

Sql-server – How to set up indexed view when SELECTing TOP 1 with ORDER BY from different tables

Basic solution

Using a nonclustered index

Minimal Indexed View

Related Question