How to Use Joins Instead of Subquery in SQL Server

greatest-n-per-groupmaterialized-viewsql serversql server 2014subquery

I am using SQL Server 2014 and I want to eliminate having to use the following sub query so I can create an indexed view.

The only thing similar to my problem is the greatest-n-per-group queries but I want all the values, instead of only the last measured factor ones.

This is my query:

SELECT
      eu.[Name],
      pp.[MonthDay] --date type,
      pp.[Value] * 
            (SELECT TOP 1 [Factor] FROM [em].[CO] co 
            WHERE co.[EUID] = eu.[Name] 
            AND co.[MonthDay] <= pp.MonthDay 
            ORDER BY [MonthDay] DESC) AS [CO]
            --multiply by the last measured factor
            --I also want to remove the sub query
FROM em.[PPGenData] pp
JOIN em.[PPEUID] eu
ON eu.ID = pp.ID

Best Answer

I think this will do it, but like Aaron said it would be more useful to have the context around the rewrite. You can probably write this more simply using WHERE EXISTS() rather than TOP 1 or ROW_NUMBER(), but without knowing your schema I'd rather not speculate.

SELECT
    EU.Name,
    PP.MonthDay,
    PP.Value * Y.Factor AS CO
FROM
    em.PPGenData AS PP
    INNER JOIN em.PPEUID AS EU ON PP.ID = EU.ID
    LEFT JOIN
        (
        SELECT
            ID, Factor
        FROM
            (
            SELECT
                PP2.ID, CO.Factor, ROW_NUMBER() OVER (PARTITION BY PP2.ID ORDER BY CO.MonthDay DESC) AS Seq
            FROM
                em.PPGenData AS PP2
                INNER JOIN em.PPEUID AS EU2 ON PP2.ID = EU2.ID
                INNER JOIN em.CO AS CO ON EU2.Name = CO.EUID
            WHERE
                CO.MonthDay < PP.MonthDay
            ) AS X
        WHERE
            Seq = 1
        ) AS Y ON PP.ID = Y.ID

Note that subquery X needs to be wrapped in another subquery since a windowed function cannot be used in a WHERE clause.

Related Solutions

Improve performance with the WHERE NOT IN sub-select clause

With queries like this is it often more efficient to perform a LEFT OUTER JOIN instead of the NOT EXISTS style check, it often implies a full index scan (or table scan without the right indexes in place) but with many rows in the main table(s) this is less expensive than the large number of index seeks (one on the reference table for each row returned from the main table) that would otherwise result. Some query planners are quite bright about spotting this equivalence and using the alternate plan where it is the better choice, but it doesn't sound like this has happened in your case.

Try something like:

SELECT t1.CUSTID, COUNT(*)
FROM   CUST_TRX t1
LEFT OUTER JOIN
       CUST_TRX t2 
ON     t2.CUSTID=t1.CUSTID 
AND    t2.DATED<CURRENT_DATE-365
WHERE  t2.CUSTID IS NULL
GROUP BY t1.CUSTID

(note: I'm not familiar with firebird, so the above syntax may need tweaks but should illustrate the point)

Without the WHERE t2.CUSTID IS NULL every row from t1 with matches in t2 will be output once for every match found in t2 and those with no matches in t2 will be output once but with any columns selected from that object set to NULL. The WHERE clause then screens out the matches.

Depending on the DB engine's abilities, especially if the amount of data in the reference object (CUST_TRX with a filter applied here) is huge, this may be significantly less efficient than the WHERE <something> NOT IN or WHERE NOT EXISTS options, so benchmark over realistic data sets first before using the method. It often works out much more efficient with MS SQL Server in cases where the query planner doesn't notice that the WHERE NOT IN arrangement can be performed this way more efficiently.

Also if you do it this way around leave a comment in the code (and/or supporting documentation) to say that you are doing this as an equivalent to WHERE <something> NOT IN or WHERE NOT EXISTS which you expect to be more efficient. You'll remember it and an experienced SQL person will recognise the pattern, but other people looking at the code might not immediately understand the intent/reason and flip it back to using WHERE NOT EXISTS for clarity as that reads better as on English sentence.

SQL Server – Unique Clustered Index on Indexed View Fails Due to Aggregates

Your issue is definitely your function, and you will have to do something to make that work differently. Here is a very basic example to illustrate the problem.

First create your test data.

CREATE TABLE dbo.testResults
(
    id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
    ,col1 VARCHAR(200)
);
GO
INSERT INTO dbo.testResults
        ( col1 )
VALUES  ( 'test1' ), ( 'test1' ), ( 'test2' ), ( 'test3' ), ( 'test4' ), ( 'test4' );
GO

Now we will create a view (that works correctly) and add the clustered index to that view.

CREATE VIEW vwResults
WITH SCHEMABINDING
AS
SELECT col1
,COUNT_BIG(*) AS cbcol1
FROM dbo.testResults AS tr
GROUP BY tr.col1;
GO
--Works great
CREATE UNIQUE CLUSTERED INDEX cls_tr ON dbo.vwResults (col1);
GO

Next we will add a very simple and basic function that does nothing more than very basic string manipulation (left 3 characters of a given string).

CREATE FUNCTION dbo.left3 (@str varchar(200))
RETURNS varchar(3)
WITH schemabinding
AS
begin
RETURN LEFT(@str, 3)
END;

Now we will get rid of our previous view add this column to the view.

DROP VIEW dbo.vwResults;
GO
CREATE VIEW vwResults
WITH SCHEMABINDING
AS
SELECT col1
,dbo.left3(col1) AS left3col1
,COUNT_BIG(*) AS cbcol1
FROM dbo.testResults AS tr
GROUP BY tr.col1, dbo.left3(col1);
GO

So far, so good. But then by adding even this simplistic function to the view and nothing else, we are no longer able to index the view.

/*FAILURE!!!!!!*/
CREATE UNIQUE CLUSTERED INDEX cls_tr ON dbo.vwResults (col1);
GO

So for this example there is a pretty simple solution where I can fairly easily create a computed column and then everything works. Here is how I would do that.

ALTER TABLE dbo.testResults
ADD left3col1 AS LEFT(col1, 3);

After doing this, I can create the same view and this time I can add the unique clustered index as follows.

DROP VIEW dbo.vwResults;
GO
CREATE VIEW vwResults
WITH SCHEMABINDING
AS
SELECT col1
,left3col1
,COUNT_BIG(*) AS cbcol1
FROM dbo.testResults AS tr
GROUP BY tr.col1, left3col1;
GO
/*IT WORKS!!!!!!*/
CREATE UNIQUE CLUSTERED INDEX cls_tr ON dbo.vwResults (col1, left3col1);
GO

This might not be an option for your scenario, but basically your function is the problem and you need to work through some alternatives (depending on what it does) if you want to make an indexed view work for your scenario.

Best Answer

Related Solutions

Improve performance with the WHERE NOT IN sub-select clause

SQL Server – Unique Clustered Index on Indexed View Fails Due to Aggregates

Related Question