Sql-server – Select command returning different column values

selectsql serversql-server-2016

I am just starting SQL and am using version SQL server 2016(few days in to learning)

I am using the WideWorldImporters DB from microsoft.

The basics of my question is I don't understand some results I am getting and why?
I run this command:
use WideWorldImporters
select top (2) OrderID
From Sales.Orders;

and it returns a table with 2 lines as expected but the OrerID column values for the two rows of 1 and 16

However when I run the command:
select top (2) OrderID, CustomerID
From Sales.Orders;

I get the additional column for customerID with 1 and 1, and for the ORDERID column it now returns row values of 2934 and 3482.

I don't understand why OrderID would return the two different results. Am I running the command incorrectly?

I am trying to learn the sum and avg aggregate commands on just the top X rows of the column OrderID with the GROUP BY CUSTOMERID for testing, but ultimately want to do below

This is what I was trying to imagine doing:
If you imagined that OrderID was in fact an total order price, I was trying to sum up each customers order total from all purchases. So sum up every row with customerid = N and then list all the totals for every customerid. My select top 2 was for testing purposes so it wasn't final results, but led me to discrepancy.

Best Answer

The query you wrote is basically asking the database to give you any two rows of data (any two orders). If you wrote the query with an ORDER BY clause then your results should be the same each time.

SELECT TOP 2 OrderID FROM Sales.Orders ORDER BY OrderID ASC;

This would give you the two rows with the lowest OrderID and you will still get the same two rows even when you add other fields to the SELECT list.

You might be thinking that TOP means that the database will give you the first two, but it actually means that it should stop returning rows after it has returned two rows. And since you didn't have an ORDER BY it could just return whichever two rows are most convenient.

Related Solutions

Sql-server – Splitting a csv column from select query into multiple columns

Making use of Jeff Moden's Tally-Ho! CSV splitter from here:

CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
     -- enough to cover VARCHAR(8000)
WITH E1(N) AS (
           SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
           SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
           SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
       ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
            SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() 
                                                        OVER (ORDER BY (SELECT NULL)) FROM E4
        ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just
                     -- once for each delimiter)
            SELECT 1 UNION ALL
            SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
        ),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
            SELECT s.N1,
                   ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
            FROM cteStart s
        )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final
     -- element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
        Item       = SUBSTRING(@pString, l.N1, l.L1)
   FROM cteLen l
;
go

We can code the solution as an apply against Jeff's function and a pivot like so:

with data as (
    select Code,Location,Quantity,Store from ( values
        ('L698-W-EA',          NULL,                                      2, 'A')
       ,('L82009-EA',          'A1K2, A1N2, C4Y3, CBP2',                  2, 'A')
       ,('L80401-A-EA',        'A1S2, SHIP, R2F1, CBP5, BRP, BRP1-20',    17,'A')
       ,('CWD2132W-BOX-25PK',  'A-AISLE',                                 1, 'M')
       ,('GM22660003-EA',      'B1K2',                                    1, 'M')
    )data(Code,Location,Quantity,Store)
)
,shredded as (
    select Code,Location,Quantity,Store,t.*
    from data
    cross apply [dbo].[DelimitedSplit8K](data.Location,',') as t
)
select 
    pvt.Code,pvt.Quantity,pvt.Store
   ,cast(isnull(pvt.[1],' ') as varchar(8)) as Loc1
   ,cast(isnull(pvt.[2],' ') as varchar(8)) as Loc2
   ,cast(isnull(pvt.[3],' ') as varchar(8)) as Loc3
   ,cast(isnull(pvt.[4],' ') as varchar(8)) as Loc4
   ,cast(isnull(pvt.[5],' ') as varchar(8)) as Loc5 
   ,cast(isnull(pvt.[6],' ') as varchar(8)) as Loc6
from shredded
pivot (max(Item) for ItemNumber in ([1],[2],[3],[4],[5],[6])) pvt;
;
go

yielding this:

Code              Quantity    Store Loc1     Loc2     Loc3     Loc4     Loc5     Loc6
----------------- ----------- ----- -------- -------- -------- -------- -------- --------
L698-W-EA         2           A                                                   
L82009-EA         2           A     A1K2      A1N2     C4Y3     CBP2              
L80401-A-EA       17          A     A1S2      SHIP     R2F1     CBP5     BRP      BRP1-20
CWD2132W-BOX-25PK 1           M     A-AISLE                                       
GM22660003-EA     1           M     B1K2

Sql-server – Using TOP and getting different result sets

Take a look at the plans. When you use SELECT * it probably uses the clustered index, and when you only want one column, maybe there is a skinnier index to use.

Don't "expect" a certain order. If you don't tell SQL Server how to order, then it will use the most efficient way possible, and this can change due to probably more than 20 factors.

If you want a certain order, SAY SO. Please read #3 here:

T-SQL Tuesday #56 : SQL Server Assumptions

Also, this post by Michael Swart may be an interesting read:

Without ORDER BY, You Can’t Depend On the Order of Results

If you want your second query to be more efficient, you can consider creating an index on DateCreated (you may want to include OrderID - not sure of the present index structure).

For your actual goal of deleting n rows at a time, oldest first, and assuming OrderID is an IDENTITY column (so order create date should roughly align with that), why not use this approach (based on this great blog post, also by Michael Swart):

-- pick a datetime for the newest row you want to delete
-- let's say you want to delete all orders before Jan 1 2014:

SELECT @MaxOrderID = MAX(OrderID)
  FROM dbo.[Order] -- terrible table name, also always use dbo prefix
  WHERE DateCreated < '20140101';

DECLARE @BatchSize INT = 1000,
        @LargestOrderProcessed INT = -1,
        @NextBatchMax INT,
        @RC INT = 1;

WHILE (@RC > 0)
BEGIN
  SELECT TOP (@BatchSize) @NextBatchMax = OrderID
    FROM dbo.[Order]
    WHERE OrderID > @LargestOrderProcessed
    AND OrderID <= @MaxOrderID
    ORDER BY OrderID;

  DELETE dbo.[Order]
    WHERE OrderID > @LargestOrderProcessed
    AND OrderID <= @NextBatchMax;

  SET @RC = @@ROWCOUNT;
  SET @LargestOrderProcessed = @NextBatchMax;
END

In order to minimize the impact on the log, you may want to add some additional logic in there, from my blog post Break large delete operations into chunks. As for the dbo prefix, see Bad habits to kick : Avoiding the schema prefix.

Best Answer

Related Solutions

Sql-server – Splitting a csv column from select query into multiple columns

Sql-server – Using TOP and getting different result sets

Related Question