SQL Server – Select MAX/MIN from Multiple Tables with Other Columns

sql serversql-server-2008

I have a query where I JOIN 4 weekly "summary" tables, and SELECT a MIN and a MAX of units FROM each table. This would give me the highest/lowest total units sold for, and I'm pulling an average from another table.

    SELECT
            L4.TotalUnits / 4   AS AvgMvmt,
            MIN(Ago.TotalUnits) AS LowMvmt,
            MAX(Ago.TotalUnits) AS HighMvmt
    FROM

            RTG_LOOKUP_LAST4 AS L4 INNER JOIN
            (
                    SELECT  UCC12UPCCode, TotalUnits, Date
                    FROM    [CACTUS].[dbo].[RTG_Lookup2_Last1]

                    UNION

                    SELECT  UCC12UPCCode, TotalUnits, Date
                    FROM    [CACTUS].[dbo].[RTG_Lookup2_2_Ago]

                    UNION

                    SELECT  UCC12UPCCode, TotalUnits, Date
                    FROM    [CACTUS].[dbo].[RTG_Lookup2_3_Ago]

                    UNION

                    SELECT  UCC12UPCCode, TotalUnits, Date
                    FROM    [CACTUS].[dbo].[RTG_Lookup2_4_Ago]
            ) AS Ago
                    ON Ago.UCC12UPCCode = L4.UCC12UPCCode
    WHERE
            L4.UCC12UPCCode = '01254601144'
    GROUP BY
            L4.TotalUnits

Now I need to add two more columns, HighDate and LowDate. These columns represent the date for the week that sold the highest/lowest units.

The Ago table looks like this:

UCC12UPCCode  TotalUnits  Date
------------- ----------- ----------
01254601144   90          2018-04-14
01254601144   98          2018-05-05
01254601144   107         2018-04-21
01254601144   132         2018-04-28

How would pull the Date column from the correct table so my end results looks like below:

LowMvmt     HighMvmt    LowDate    HighDate
----------- ----------- ---------- ----------
90          132         2018-04-14 2018-04-28

E: Fiddle
https://dbfiddle.uk/?rdbms=sqlserver_2012&fiddle=749c59e1f11bc2146a605072f1ef7d98

Best Answer

As a another option

WITH Ago AS (
    SELECT  UCC12UPCCode, TotalUnits, Date
    FROM    [CACTUS].[dbo].[RTG_Lookup2_Last1]

    UNION

    SELECT  UCC12UPCCode, TotalUnits, Date
    FROM    [CACTUS].[dbo].[RTG_Lookup2_2_Ago]

    UNION

    SELECT  UCC12UPCCode, TotalUnits, Date
    FROM    [CACTUS].[dbo].[RTG_Lookup2_3_Ago]

    UNION

    SELECT  UCC12UPCCode, TotalUnits, Date
    FROM    [CACTUS].[dbo].[RTG_Lookup2_4_Ago]
    ),
    Ranked AS (
    SELECT *, 
        ROW_NUMBER() OVER (PARTITION BY UPC ORDER BY TotalUnits, Date) LowUnitRank, 
        ROW_NUMBER() OVER (PARTITION BY UPC ORDER BY TotalUnits DESC, Date) HighUnitRank
    FROM Ago
    )
SELECT L4.UCC12UPCCode,
    MAX(CASE WHEN LowUnitRank = 1 THEN TotalUnits END) LowTotalUnits,
    MAX(CASE WHEN HighUnitRank = 1 THEN TotalUnits END) HighTotalUnits,
    MAX(CASE WHEN LowUnitRank = 1 THEN Date END) LowUnitDate,
    MAX(CASE WHEN HighUnitRank = 1 THEN Date END) HighUnitDate
FROM RTG_LOOKUP_LAST4 L4
    INNER JOIN Ranked r ON L4.UCC12UPCCode = r.UCC12UPCCode
GROUP BY L4.UCC12UPCCode;

I've done this using a CTE query as I find it easier to follow and explain.

The first CTE is just your Ago data.

The second CTE ranks each row of the Ago data using ROW_NUMBER based on the ordering of the TotalUnits. I have include the Date in the order so that if there are duplicate TotalUnits the first occurrence will be picked.

In the final query we are using CASE statements to pivot the data and aggregating the result for each UCC12UPCCode with MAX to create a single row for each UCC12UPCCode.

This query can be run over multiple UCC12UPCCode's

Implied Index Keys

Your index is on id DESC, timeSampled DESC. In SQL Server 2008 and later, partitioning introduces an extra implied leading key on $partition ASC (it is always ascending, it is not configurable) making the full index key $partition ASC, id DESC, timeSampled DESC.

Since id and timeSampled increase together (though there is nothing in the schema to guarantee this) you could rewrite the query as TOP (1) ... ORDER BY $partition DESC, id DESC. Unfortunately, the DESC keys on your index and ASC implied leading key $partition means the index could not be used to scan just one row from the index in order.

If your index keys were instead id ASC, timeSampled ASC the whole index key would be $partition ASC, id ASC, timeSampled ASC. This all-ASC index could be scanned backward, returning just the first row in key order. This row would be guaranteed to have the highest id value in the highest-numbered partition. Given the (unenforced) relationship between id and partition id, this would produce the correct result with an optimal execution plan that reads just a single row.

This 'solution' lacks integrity because the id-timeSampled relationship is not enforced, and you probably do not want to rebuild the nonclustered primary key anyway. Nevertheless, I mention it because it may enhance your understanding of how partitioning interacts with indexes.

Sql-server – How to speed up query on table with millions of rows

On reason this can happen is that you're using local variables.

The problem is that this query takes so much time to go, despite all of the indexes i've made on different columns.

Here's an example using a similar setup. In the Stack Overflow schema there's a narrow-ish table called Votes that looks like this.

With no index on CreationDate, our only option would be to scan the Clustered Index. But if we create one only on CreationDate, the optimizer can choose to use that index if it thinks doing a Key Lookup for the rest of the columns is cheaper than scanning the Clustered Index and applying a predicate.

CREATE INDEX ix_yourmom ON dbo.Votes(CreationDate)

For this query:

DECLARE @StartDate DATETIME = '2010-07-01';
DECLARE @EndDate DATETIME = '2010-07-02';

SELECT *
FROM   dbo.Votes AS v
WHERE  v.CreationDate BETWEEN @StartDate AND @EndDate;
GO

The cardinality estimate for unknown variables using between is 16.4317%. That leads to a clustered index scan and a missing index request for an index that covers the entire query.

If you run the query with RECOMPILE, you allow for the parameter embedding optimization.

DECLARE @StartDate DATETIME = '2010-07-01';
DECLARE @EndDate DATETIME = '2010-07-02';

SELECT *
FROM   dbo.Votes AS v
WHERE  v.CreationDate BETWEEN @StartDate AND @EndDate
OPTION ( RECOMPILE );

Which gives us a different query plan, and a more accurate estimate.

Hope this helps!

Best Answer

Related Solutions

Sql-server – Why does selecting top 1 from composite index DESC also used to partition by month not select the top value

Implied Index Keys

Sql-server – How to speed up query on table with millions of rows

Related Question