Sql-server – PaaS Server – Random row order while using aggregate function with group by

azure-sql-databasesql server

Result set of a query with an aggregate function and group by (no order by specified) returns different result or at least in different order almost in each execution.

I have a database on Azure PaaS server and it's behaving weirdly. The query in issue is something like this:

SELECT TOP 100 "Org Unit", count(*) FROM "paas-server-12032019"."dbo"."testTable" GROUP BY "Org Unit"

When I check the "Org Unit" order, it's different each time. If I omit, count(*) and just keep the query as

SELECT TOP 100 "Org Unit" FROM "paas-server-12032019"."dbo"."testTable" GROUP BY "Org Unit"

It works as expected. Order of the record is consistent each time.
What I noticed

It happened with a specific database, I have another one and in there it works as expected.
Secondly, I think it has something to do when the column in group by is nvarchar(max)

I have a limited access currently to DB setup but wondering if there is anything that can cause this issue.

Any sort of help/discussion is deeply appreciated!

Best Answer

Without ORDER BY, you tell SQL Server that you don't care about the ordering.

SQL Server can do GROUP BY and aggregation internally by either sorting data or through hashing. SQL Server decides this based on what it expects to be the more efficient way. For your case when the order isn't consistent, SQL server found it more efficient to use hashing, apparently. You can see this in the execution plan.

Bottom line is, as always, if you want some specific order for the rows, then you need to use ORDER BY.

Related Solutions

Does Detach/Attach or Offline/Online Clear Buffer Cache for Database?

I initially thought you were on to something here. Working assumption was along the lines that perhaps the buffer pool wasn't immediately flushed as it requires "some work" to do so and why bother until the memory was required. But...

Your test is flawed.

What you're seeing in the buffer pool is the pages read as a result of re-attaching the database, not the remains of the previous instance of the database.

And we can see that the buffer pool was not totally blown away by the detach/attach. Seems like my buddy was wrong. Does anyone disagree or have a better argument?

Yes. You're interpreting physical reads 0 as meaning there were not any physical reads

Table 'DatabaseLog'. Scan count 1, logical reads 782, physical reads 0, read-ahead reads 768, lob logical reads 94, lob physical reads 4, lob read-ahead reads 24.

As described on Craig Freedman's blog the sequential read ahead mechanism tries to ensure that pages are in memory before they're requested by the query processor, which is why you see zero or a lower than expected physical read count reported.

When SQL Server performs a sequential scan of a large table, the storage engine initiates the read ahead mechanism to ensure that pages are in memory and ready to scan before they are needed by the query processor. The read ahead mechanism tries to stay 500 pages ahead of the scan.

None of the pages required to satisfy your query were in memory until read-ahead put them there.

As to why online/offline results in a different buffer pool profile warrants a little more idle investigation. @MarkSRasmussen might be able to help us out with that next time he visits.

Sql-server – Query to return fields of distinct values per key

Without knowing anything about the source data, perhaps this would do what you want?

USE Test;
GO
CREATE TABLE GENDER
(
    ORG INT NOT NULL
    , GENDER VARCHAR(1) NOT NULL
);

CREATE TABLE AGE
(
    ORG INT NOT NULL
    , AGE TINYINT
);

CREATE TABLE STATES
(
    ORG INT NOT NULL
    , STATENAME VARCHAR(255)
);

INSERT INTO Gender (ORG, GENDER) VALUES (1, 'M');
INSERT INTO Gender (ORG, GENDER) VALUES (1, 'F');
INSERT INTO Gender (ORG, GENDER) VALUES (2, 'M');
INSERT INTO Gender (ORG, GENDER) VALUES (2, 'F');
INSERT INTO Gender (ORG, GENDER) VALUES (3, 'M');
INSERT INTO Gender (ORG, GENDER) VALUES (3, 'F');

INSERT INTO AGE (ORG, AGE) VALUES (1,27);
INSERT INTO AGE (ORG, AGE) VALUES (1,28);
INSERT INTO AGE (ORG, AGE) VALUES (1,29);
INSERT INTO AGE (ORG, AGE) VALUES (1,30);
INSERT INTO AGE (ORG, AGE) VALUES (2,37);
INSERT INTO AGE (ORG, AGE) VALUES (2,38);
INSERT INTO AGE (ORG, AGE) VALUES (2,39);
INSERT INTO AGE (ORG, AGE) VALUES (2,40);
INSERT INTO AGE (ORG, AGE) VALUES (3, 2);

INSERT INTO STATES (ORG, STATENAME) VALUES (1,'FL');
INSERT INTO STATES (ORG, STATENAME) VALUES (1,'GA');
INSERT INTO STATES (ORG, STATENAME) VALUES (1,'MN');
INSERT INTO STATES (ORG, STATENAME) VALUES (1,'NM');
INSERT INTO STATES (ORG, STATENAME) VALUES (2,'FL');
INSERT INTO STATES (ORG, STATENAME) VALUES (2,'MN');
INSERT INTO STATES (ORG, STATENAME) VALUES (2,'NM');
INSERT INTO STATES (ORG, STATENAME) VALUES (3,'FL');
INSERT INTO STATES (ORG, STATENAME) VALUES (3,'GA');
INSERT INTO STATES (ORG, STATENAME) VALUES (3,'NM');

CREATE TABLE FACTS
(
    ORG INT NOT NULL
    , GENDER VARCHAR(1) NULL
    , AGE INT NULL
    , STATENAME VARCHAR(255) NULL
);

INSERT INTO FACTS (ORG, GENDER, AGE, STATENAME)
SELECT ORG, GENDER, NULL, NULL
FROM GENDER
GROUP BY ORG, GENDER
UNION ALL
SELECT ORG, NULL, AGE, NULL
FROM AGE
GROUP BY ORG, AGE
UNION ALL
SELECT ORG, NULL, NULL, STATENAME
FROM STATES;

SELECT *
FROM FACTS
ORDER BY ORG;

The results:

enter image description here

This will create a FACTS table that has all the data from several source tables. As @ypercube and @jon-seigel said, this really doesn't make much sense; perhaps we are missing something compelling about your setup.

If this is not what you were expecting, please provide the source tables, and any other pertinent details.

Best Answer

Related Solutions

Does Detach/Attach or Offline/Online Clear Buffer Cache for Database?

Sql-server – Query to return fields of distinct values per key

Related Question