Mysql – Distinct Combination of Two Columns

countdistinctMySQLmysql-5.1

I am currently trying to find distinct combinations within a table as two of the columns have a many to many relationship with each other.

The data is all around backup policies being run on against particularly clients and could be summarized as below:

The answer I would like to generate for the above table would be 7 as there are that number of distinct combinations.

Has anyone got an idea of how this might be done? I have tried experimenting with nested counts and distinct values (where I was able to filter to one column but not both).

Best Answer

You can count distinct elements by running:

select count(distinct policy_id, client_id) from policy_client;

Another option would be to group by and count that:

select count(*) from (select policy_id, client_id from policy_client group by 1,2) a;

Run both version and see which one performs better on your dataset.

A very quick way but not totally accurate if you have a key on (policy_id and client_id) you can also check the cardinality of that index but that's an approximate not exact number.

Related Solutions

Mysql – Is it ok to have 40+ columns in a table

Considering that MySQL is still a relation database management system an (RDBMS if you will), you should considering the following things:

Your tables are defined by the data they contain and the relationships to other tables
Your data should be as non-redundant as possible

This last point leads to the process of normalization, which is a very important thing to do in a RDBMS.

Have a look at the Wikipedia article on Normalization to have a basic understanding of what that process entails: http://en.wikipedia.org/wiki/Database_normalization

I find that, oftentimes, it can be a good idea to take a step back from the "storage" aspect of your database. Try to look at your application, the data models used within and their relationships. Creating such an abstraction is called an ERM: Entity-Relationship Model, or an ERD: an Entity-Relationship Diagram. More info on that: http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model

For your case specificially:

Apart from the fact that your not using a design that fits a relational database, please consider the fact that you are creating a very inflexible design! What if -- for example -- you were to create a new product in the future? Would you add a new column to the table? Surely you'll agree that that's pretty stupid.

Instead, it makes MUCH more sense to create something along the lines of

+-----------+-------------+----------------+
| client_id | client_name | client_address |
+-----------+-------------+----------------+
| 1         | Snake Logan | ...            |
| 2         | Mad Max     | ...            |
| ...       | ...         | ...            |
+-----------+---------- --+----------------+

+------------+---------------+---------------+
| product_id | product_desc  | product_price |
+------------+---------------+---------------+
| 1          | Silver Bullet | 14.00         |
| 2          | Hand grenade  | 7.00          |
| ...        | ...           | ...           |
+------------+---------------+---------------+

+-------------+-----------+------------+-----------------+
| purchase_id | client_id | product_id | purchase_amount |
+-------------+-----------+------------+-----------------+
| 1           | 1         | 1          | 3               |
| 2           | 1         | 2          | 1               |
| 3           | 2         | 2          | 5               |
+-------------+-----------+------------+-----------------+

I'm not saying this is the greatest design ever, but this is more elegant.

If you want to have more info concerning the purchase, I'd recommand building a view that contains the necessary info by joing the tables together. Please don't create a field like purchase_total, I think that would once again be redundant data. The data is already there (purchase_amount and product_id), you should need to do the math (purchase_amount * product_price). You can put that in the view if you want. Tables are for storing data, views are there to present said data.

Sql-server – Grouping Data Using Joined Tables with Certain Distinct Values

If it is safe to assume that a single order can have only one distinct StoreID, you could resolve your issue by generating the set of distinct OrderID, StoreID pairs from orderLines and join that set instead of the table itself. That way you will not need to use DISTINCT with aggregation:

SELECT 
  `ol`.`storeID`,
  SUM(`o`.`grossValue`) AS 'Total',
  SUM(`o`.`paymentValue`) AS 'paymentTotal'
FROM
  `orders` AS `o`
LEFT JOIN
  (SELECT DISTINCT `OrderID`, `StoreID` FROM `orderLines`) AS `ol`
  ON `o`.`orderID` = `ol`.`orderID`
WHERE
  (`o`.`orderDate` BETWEEN '2015-07-07 00:00:00' AND '2015-07-07 23:59:59')
GROUP
  BY `ol`.`storeID`
;

And, as has already been mentioned, you should probably use INNER JOIN instead of the LEFT JOIN, unless you have orders that do not have order lines but for some reason do have a gross value and a payment (a little strange, but may be there is a reason for that), and you want them included in the results. (They will be represented as a single row with a null Store ID.)

Best Answer

Related Solutions

Mysql – Is it ok to have 40+ columns in a table

Sql-server – Grouping Data Using Joined Tables with Certain Distinct Values

Related Question