SQL Server – Selecting from Many-to-Many by Matching Field Within a Column

sql server

Having following tables:

Order table:

OrderID
1
2
3
4
5

CustomerOrderLink table:

CustomerID  OrderID
1           3
1           4
2           2
2           4
3           3
4           4

Customer table:

CustomerID    Forename  Surname
1             A         Guy
2             Ted       Something
3             Gaby      Guy
4             Katie     Kitty

the task is to select all orders made by more than one person, but only when all the persons making order have same surname. Let's say that when customers have matching surname, they are a family. So to rephrase it: We want to have all the orders all families made together (we consider a family ordering when there are two or more customers with same surname for single order).

So the expected result for dummy data above would be:

OrderID Customers
3       2

I already have a part for selecting orders that are made by more than one customer:

 select OrderID, COUNT(distinct CustomerID) as CustomersCount
 from CustomerOrderLink group by OrderID having count(distinct CustomerID) > 1

but I struggle to implement checking, wheter they have matching surname:

select OrderID, COUNT(distinct CustomerID) as CustomersCount left join Customer on Customer.CustomerID = CustomerOrderLink.CustomerID 
     from CustomerOrderLink group by OrderID having count(distinct CustomerID) > 1 and ??? 

I'll be thankful for help.

Best Answer

Replace the ??? in your attempted query with

COUNT(DISTINCT Customer.Surname) = 1

So, the whole query, with a couple of improvements, would be

SELECT
  co.OrderID,
  CustomerCount = COUNT(distinct c.CustomerID),
  FamilyName    = MAX(c.Surname)
FROM
  dbo.CustomerOrderLink AS co
  INNER JOIN dbo.Customer AS c ON c.CustomerID = co.CustomerID 
GROUP BY
  co.OrderID
HAVING
  COUNT(DISTINCT c.CustomerID) > 1
  AND COUNT(DISTINCT c.Surname) = 1
;

An inner join would be better here since orders without associated customers would not match the COUNT(DISTINCT c.CustomerID) > 1 condition anyway. Talking of that condition, if the CustomerOrderLink table can never have more than one CustomerID/OrderID pair (and it should not), then you can replace it with COUNT(c.CustomerID) > 1 or even COUNT(*) > 1.

Note that if Customer.Surname is a nullable column and some customers do have a null there, the above may not return what you expect, because COUNT ignores nulls (and so it will return 1 if a group contains a couple of customers with the same surname and the rest without a surname).