I am trying to query two tables and get results like the following:
Section Names
shoes AccountName1, AccountName2, AccountName3
books AccountName1
The tables are:
CREATE TABLE dbo.TableA(ID INT, Section varchar(64), AccountId varchar(64));
INSERT dbo.TableA(ID, Section, AccountId) VALUES
(1 ,'shoes','A1'),
(2 ,'shoes','A2'),
(3 ,'shoes','A3'),
(4 ,'books','A1');
CREATE TABLE dbo.TableB(AccountId varchar(20), Name varchar(64));
INSERT dbo.TableB(AccountId, Name) VALUES
('A1','AccountName1'),
('A2','AccountName2'),
('A3','AccountNAme3');
I saw a few questions answered saying to use "XML PATH" and "STUFF" to query the data to get the results I am looking for, but I think there is something missing. I have tried the below query and get the error message:
Column 'a.AccountId' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I don't have it in the SELECT clause of either query, but I assume the error is because AccountId is not unique in TableA.
Here is the query I am currently trying to get working correctly.
SELECT section, names= STUFF((
SELECT ', ' + Name FROM TableB as b
WHERE AccountId = b.AccountId
FOR XML PATH('')), 1, 1, '')
FROM TableA AS a
GROUP BY a.section
Best Answer
Sorry, I missed a step in the relationship. Try this version (though Martin's will work as well):
An approach that is at least as good, but sometimes better, is switching from
DISTINCT
toGROUP BY
:At a high level, the reason
DISTINCT
applies to the entire column list. Therefore for any duplicates it has to perform the aggregate work for every duplicate before applyingDISTINCT
. If you useGROUP BY
then it can potentially remove duplicates before doing any of the aggregation work. This behavior can vary by plan depending on a variety of factors including indexes, plan strategy, etc. And a direct switch toGROUP BY
may not be possible in all cases.In any case, I ran both of these variations in SentryOne Plan Explorer. The plans are different in a few minor, uninteresting ways, but the I/O involved with the underlying worktable is telling. Here is
DISTINCT
:And here is
GROUP BY
:When I made the tables larger (14,000+ rows mapping to 24 potential values), this difference is more pronounced.
DISTINCT
:GROUP BY
:In SQL Server 2017, you can use
STRING_AGG
:The I/O here is almost nothing:
But, if you're not on SQL Server 2017 (or Azure SQL Database), and can't use
STRING_AGG
, I have to give credit where credit is due... Paul White's answer below has very little I/O and kicks the pants off of both of theFOR XML PATH
solutions above.Other enhancements from these posts:
Also see: