Sql-server – Grouping Data Using Joined Tables with Certain Distinct Values

distinctMySQLsql server

I am struggling to get my head around how to do this so please bare with me while I explain.

We have an order system which records orders, order lines, items, users etc.
What I am interested in in this instance are the orders and order lines.

At the moment, we produce reports with sales figures, showing sales, returns etc, but it is slow and inefficient. This is because we are querying individual tables using PHP/MS SQL Server, looping through arrays and then bringing the data together.

I have been tasked to try and make this more efficient, and so I have tried to produce one query with joined, aliased, sub-query tables. These work fine and join together, but I am getting the wrong values due to the way I am grouping tables.

Note, I am unable to change any schema and I am working with an existing system.

This explanation may seem quite long-winded, and I probably will lose you with what I am trying to ask, but please stay with me.

Anyway, here is a diagram of the schema, sample data, and the result of my query at present:

Schema

SELECT 
`ol`.`storeID`,
SUM(DISTINCT `o`.`grossValue`) AS 'Total',
SUM(DISTINCT `o`.`paymentValue`) AS 'paymentTotal'
FROM `orders` AS `o`
LEFT JOIN `orderLines` AS `ol` ON `o`.`orderID` = `ol`.`orderID`
WHERE (`o`.`orderDate` BETWEEN '2015-07-07 00:00:00' AND '2015-07-07 23:59:59')
GROUP BY `ol`.`storeID`

I have the produced the above quickly in MySQL as I am at home at the moment, and I imagine the solution would be similar for SQL server, but please correct me if I'm wrong, or perhaps provide solutions for both, which would be great!

Basically, the orders have many lines to them. The orders record the total value of the lines, and how much has been paid so far.
Each order line is a reference to a stock item.
Each line records which store it is recorded to, as different items could be sourced from different stores.
We want to produce reports that show how much each store makes in turnover.

As you can see in the result of my query store 06 is showing 4.99, however this should be 9.98.

What is happening is the netValue is being made distinct. I DON'T want this.
I DON'T want duplicates of order <=> orderlines, but I DO want duplicates of the netValue as these are individual orders.

Have I confused you yet?

Is there a way to still group by the storeID and sum the netValue, by perhaps grouping by the orderID, or orderLineID?

I have tried all types of combinations, and gotten many errors which I don't remember now, but mostly about aggregate columns and not being able to group etc.

All suggestions and help welcome 🙂

Create table code:

CREATE TABLE `orderLines` (
  `orderLineID` int(10) NOT NULL AUTO_INCREMENT,
  `orderID` int(10) NOT NULL,
  `itemCode` varchar(30) NOT NULL,
  `netValue` decimal(10,2) NOT NULL,
  `taxValue` decimal(10,2) NOT NULL,
  `storeID` varchar(2) NOT NULL,
  PRIMARY KEY (`orderLineID`)
);



CREATE TABLE `orders` (
  `orderID` int(10) NOT NULL AUTO_INCREMENT,
  `grossValue` decimal(10,2) NOT NULL,
  `paymentValue` decimal(10,2) NOT NULL,
  `typeID` int(10) NOT NULL,
  `orderDate` datetime NOT NULL,
  PRIMARY KEY (`orderID`)
);

Expected Result:

+---------+-------+--------------+
| storeID | Total | paymentTotal |
+---------+-------+--------------+
| 02      | 10.00 |  8.00        |
| 06      | 9.98  |  9.98        |
| 07      | 3.00  |  3.00        |
+---------+-------+--------------+

Best Answer

If it is safe to assume that a single order can have only one distinct StoreID, you could resolve your issue by generating the set of distinct OrderID, StoreID pairs from orderLines and join that set instead of the table itself. That way you will not need to use DISTINCT with aggregation:

SELECT 
  `ol`.`storeID`,
  SUM(`o`.`grossValue`) AS 'Total',
  SUM(`o`.`paymentValue`) AS 'paymentTotal'
FROM
  `orders` AS `o`
LEFT JOIN
  (SELECT DISTINCT `OrderID`, `StoreID` FROM `orderLines`) AS `ol`
  ON `o`.`orderID` = `ol`.`orderID`
WHERE
  (`o`.`orderDate` BETWEEN '2015-07-07 00:00:00' AND '2015-07-07 23:59:59')
GROUP
  BY `ol`.`storeID`
;

And, as has already been mentioned, you should probably use INNER JOIN instead of the LEFT JOIN, unless you have orders that do not have order lines but for some reason do have a gross value and a payment (a little strange, but may be there is a reason for that), and you want them included in the results. (They will be represented as a single row with a null Store ID.)