MySQL: SELECT with SUM() and COUNT() as new columns

countMySQLselectsum

What is the correct SELECT statement to produce this desired output?

+------------+----------+-------------+-----------------+
| QtyOfParts | FamilyID | NumInFamily | TotalQtyOfParts |
+------------+----------+-------------+-----------------+
  28           1          3               79
  32           1          3               79
  19           1          3               79
  28           2          2               66
  38           2          2               66
  46           3          4              151
  30           3          4              151
  51           3          4              151
  24           3          4              151
...more rows...

'QtyOfParts' and 'FamilyID' are existing columns in the table.

'NumInFamily' is a calculated column for display/report; its
value is the COUNT() of each unique FamilyID. E.g., FamilyID
= 1 has 3 in its family; FamilyID = 2 has 2 in its family;
FamilyID = 3 has 4 in its family.

'TotalQtyOfParts' is a calculated column for display/report;
its values is the SUM() of the QtyOfParts for each FamilyID.
E.g., FamilyID = 1 has 28 + 32 + 19 = 79 parts; FamilyID =
2 has 19 + 28 = 66 parts.

Using 'SELECT … COUNT(FamilyID) … GROUP BY FamilyID' fails
to show each and every row — instead it shows only one row
for each FamilyID. So for example instead of seeing three rows
for FamilyID = 1, only one row is shown. (Note: ROLLUP made no
difference.)

Using SUM() in the select simply tallied the sum of every single
value under QtyOfParts.

CREATE TABLE IF NOT EXISTS `Test`.`NoOps` (
  `ID_NoOps` INT NOT NULL,
  `QtyOfParts` INT(3) NOT NULL,
  `FamilyID` INT(3) NOT NULL,
  PRIMARY KEY (`ID_NoOps`))
ENGINE = InnoDB

Best Answer

If we assume that 'QtyOfParts' and 'FamilyID' are existing columns in the table and furthermore, that the other 2 columns are "calculated", then maybe the following query would be helpful (creating some test data first):

create table qf (
  QtyOfParts integer
, FamilyID integer 
);

insert into qf values 
(28, 1), (32, 1), (19, 1), 
(28, 2), (38, 2), (46, 3),
(30, 3), (51, 3), (24, 3);

Thus:

mysql> select * from qf;
+------------+----------+
| QtyOfParts | FamilyID |
+------------+----------+
|         28 |        1 |
|         32 |        1 |
|         19 |        1 |
|         28 |        2 |
|         38 |        2 |
|         46 |        3 |
|         30 |        3 |
|         51 |        3 |
|         24 |        3 |
+------------+----------+
9 rows in set (0.00 sec)

Suggested query:

mysql> select 
    ->   t1.qtyofparts
    -> , t1.familyid
    -> , t2.nbr NumInFamily
    -> , t2.sum FullQtyOfParts 
    -> from qf t1 join ( 
    ->     select 
    ->       familyid
    ->     , count(familyid) nbr
    ->     , sum(qtyofparts) sum
    ->     from qf 
    ->     group by familyid ) t2 on t1.familyid = t2.familyid
    -> ;
+------------+----------+-------------+----------------+
| qtyofparts | familyid | NumInFamily | FullQtyOfParts |
+------------+----------+-------------+----------------+
|         28 |        1 |           3 |             79 |
|         32 |        1 |           3 |             79 |
|         19 |        1 |           3 |             79 |
|         28 |        2 |           2 |             66 |
|         38 |        2 |           2 |             66 |
|         46 |        3 |           4 |            151 |
|         30 |        3 |           4 |            151 |
|         51 |        3 |           4 |            151 |
|         24 |        3 |           4 |            151 |
+------------+----------+-------------+----------------+
9 rows in set (0.00 sec)

Benchmarks

Postgres 13

db<>fiddle here

Basically the same results as for Postgres 10 below. (I added a test without the new parallelism.)

Postgres 10

I ran a new series of tests for Postgres 10, including the aggregate FILTER clause and demonstrating the role of an index for small and big counts.

Simple setup:

CREATE TABLE tbl (
   tbl_id int
 , amount int NOT NULL
);

INSERT INTO tbl
SELECT g, (random() * 150)::int
FROM   generate_series (1, 1000000) g;

-- only relevant for the last test
CREATE INDEX ON tbl (amount);

Actual times vary quite a bit due to background noise and specifics of the test bed. Showing typical best times from a bigger set of tests. These two cases should capture the essence:

Test 1 counting ~ 1 % of all rows

SELECT COUNT(NULLIF(amount > 148, FALSE))            FROM tbl; -- 140 ms
SELECT SUM((amount > 148)::int)                      FROM tbl; -- 136 ms
SELECT SUM(CASE WHEN amount > 148 THEN 1 ELSE 0 END) FROM tbl; -- 133 ms
SELECT COUNT(CASE WHEN amount > 148 THEN 1 END)      FROM tbl; -- 130 ms
SELECT COUNT((amount > 148) OR NULL)                 FROM tbl; -- 130 ms
SELECT COUNT(*) FILTER (WHERE amount > 148)          FROM tbl; -- 118 ms -- !

SELECT count(*) FROM tbl WHERE amount > 148; -- without index  --  75 ms -- !!
SELECT count(*) FROM tbl WHERE amount > 148; -- with index     --   1.4 ms -- !!!

db<>fiddle here

Test 2 counting ~ 33 % of all rows

SELECT COUNT(NULLIF(amount > 100, FALSE))            FROM tbl; -- 140 ms
SELECT SUM((amount > 100)::int)                      FROM tbl; -- 138 ms
SELECT SUM(CASE WHEN amount > 100 THEN 1 ELSE 0 END) FROM tbl; -- 139 ms
SELECT COUNT(CASE WHEN amount > 100 THEN 1 END)      FROM tbl; -- 138 ms
SELECT COUNT(amount > 100 OR NULL)                   FROM tbl; -- 137 ms
SELECT COUNT(*) FILTER (WHERE amount > 100)          FROM tbl; -- 132 ms -- !

SELECT count(*) FROM tbl WHERE amount > 100; -- without index  -- 102 ms -- !!
SELECT count(*) FROM tbl WHERE amount > 100; -- with index     --  55 ms -- !!!

db<>fiddle here

The last test in each set used an index-only scan, which is why it helped for counting one third of all rows. Plain index or bitmap index scans cannot compete with a sequential scan when involving roughly 5 % or more of all rows.

Old test for Postgres 9.1

To verify I ran a quick test with EXPLAIN ANALYZE on a real life table in PostgreSQL 9.1.6.

74208 of 184568 rows qualified with the condition kat_id > 50. All queries return the same result. I ran each like 10 times in turns to exclude caching effects and appended the best result as note:

SELECT SUM((kat_id > 50)::int)                      FROM log_kat; -- 438 ms
SELECT COUNT(NULLIF(kat_id > 50, FALSE))            FROM log_kat; -- 437 ms
SELECT COUNT(CASE WHEN kat_id > 50 THEN 1 END)      FROM log_kat; -- 437 ms
SELECT COUNT((kat_id > 50) OR NULL)                 FROM log_kat; -- 436 ms
SELECT SUM(CASE WHEN kat_id > 50 THEN 1 ELSE 0 END) FROM log_kat; -- 432 ms

Hardly any real difference in performance.

Mysql – 2 GROUP BY WITH DISTINCT , SUM , COUNT : PHP MYQL

I believe you are asking for the sum of "distinct" values of owgh for each value of mid and pid - a different "granularity" of data.

e.g. for mid = 3 then add 1.5 + 0.6 + 1.2 + 3.0 = 6.3; only count the 1.5 once instead of twice. However if your record ids for the 1.5 were for 2 different pid values, then you would want to count the 1.5 twice.

One way you could do it is instead of querying the "test" table directly, calculate some of the aggregates beforehand:

Select mid,
COUNT(distinct pid) as cpid,
SUM(nwgh) as totalnwgh,
SUM(owgh) as totowgh    -- I added this one
from 
    (select mid, pid, sum(distinct owgh) as owgh, sum(nwgh) as nwgh from test group by mid, pid) as A
GROUP BY mid

Best Answer

Related Solutions

Sql-server – For absolute performance, is SUM faster or COUNT

Benchmarks

Postgres 13

Postgres 10

Old test for Postgres 9.1

Mysql – 2 GROUP BY WITH DISTINCT , SUM , COUNT : PHP MYQL

Related Question