Mysql – “GROUP BY” in ranges

group byMySQL

Suppose I have a table with a numeric column (lets call it "score").

I'd like to generate a table of counts, that shows how many times scores appeared in each range.

For example:

score range  | number of occurrences
-------------------------------------
   0-9       |        11
  10-19      |        14
  20-29      |         3
   ...       |       ...

In the example I have given explicit ranges. But don't know what range could be possible.

Is there an easy way to set this up? What do you recommend?

Best Answer

SUGGESTED QUERY

SELECT
    CONCAT(A.ndx,' - ',A.ndx+9) "score range",
    IFNULL(B.rowcount ,0) "number of occurrences"
FROM
(
    SELECT 0 ndx UNION SELECT 10 UNION SELECT 20 UNION
    SELECT 30 UNION SELECT 40 UNION SELECT 50 UNION SELECT 60
    UNION SELECT 70 UNION SELECT 80 UNION SELECT 90
) A
LEFT JOIN
(
    SELECT ndx,COUNT(1) rowcount FROM
    (SELECT FLOOR(score/10)*10 ndx FROM yoshi_scores) AA
    GROUP BY ndx
) B USING (ndx);

SAMPLE DATA

use test
DROP TABLE IF EXISTS yoshi_scores;
CREATE TABLE yoshi_scores
(id int not null auto_increment,
score int not null,primary key (id), key (score));
INSERT INTO yoshi_scores (score) VALUES
(97),(74),(22),(98),(65),(62),(47),(64),(82),( 8),(60),(12),
(27),(14),(13),(28),(60),(12),(27),(34),(32),(89),(15),( 2);

Data would look like this

mysql> SELECT * FROM yoshi_scores ORDER BY id;
+----+-------+
| id | score |
+----+-------+
|  1 |    97 |
|  2 |    74 |
|  3 |    22 |
|  4 |    98 |
|  5 |    65 |
|  6 |    62 |
|  7 |    47 |
|  8 |    64 |
|  9 |    82 |
| 10 |     8 |
| 11 |    60 |
| 12 |    12 |
| 13 |    27 |
| 14 |    14 |
| 15 |    13 |
| 16 |    28 |
| 17 |    60 |
| 18 |    12 |
| 19 |    27 |
| 20 |    34 |
| 21 |    32 |
| 22 |    89 |
| 23 |    15 |
| 24 |     2 |
+----+-------+
24 rows in set (0.00 sec)

SUGGESTED QUERY EXECUTED

mysql> SELECT
    ->     CONCAT(A.ndx,' - ',A.ndx+9) "score range",
    ->     IFNULL(B.rowcount ,0) "number of occurrences"
    -> FROM
    -> (
    ->     SELECT 0 ndx UNION SELECT 10 UNION SELECT 20 UNION
    ->     SELECT 30 UNION SELECT 40 UNION SELECT 50 UNION SELECT 60
    ->     UNION SELECT 70 UNION SELECT 80 UNION SELECT 90
    -> ) A
    -> LEFT JOIN
    -> (
    ->     SELECT ndx,COUNT(1) rowcount FROM
    ->     (SELECT FLOOR(score/10)*10 ndx FROM yoshi_scores) AA
    ->     GROUP BY ndx
    -> ) B USING (ndx);
+-------------+-----------------------+
| score range | number of occurrences |
+-------------+-----------------------+
| 0 - 9       |                     2 |
| 10 - 19     |                     5 |
| 20 - 29     |                     4 |
| 30 - 39     |                     2 |
| 40 - 49     |                     1 |
| 50 - 59     |                     0 |
| 60 - 69     |                     5 |
| 70 - 79     |                     1 |
| 80 - 89     |                     2 |
| 90 - 99     |                     2 |
+-------------+-----------------------+
10 rows in set (0.00 sec)

mysql>

Why is the Query designed this way ??? Look at the first subquery

(
    SELECT 0 ndx UNION SELECT 10 UNION SELECT 20 UNION
    SELECT 30 UNION SELECT 40 UNION SELECT 50 UNION SELECT 60
    UNION SELECT 70 UNION SELECT 80 UNION SELECT 90
) A

I performed a LEFT JOIN of this to the counts for a reason. Notice that the dataset has nothing in the 50 - 59 range. Its count would not show up in the second query:

mysql> SELECT ndx,COUNT(1) rowcount FROM
    -> (SELECT FLOOR(score/10)*10 ndx FROM yoshi_scores) AA
    -> GROUP BY ndx;
+------+----------+
| ndx  | rowcount |
+------+----------+
|    0 |        2 |
|   10 |        5 |
|   20 |        4 |
|   30 |        2 |
|   40 |        1 |
|   60 |        5 |
|   70 |        1 |
|   80 |        2 |
|   90 |        2 |
+------+----------+
9 rows in set (0.00 sec)

mysql>

I am sure you wanted the range 50 - 59 to show up, so the query is designd to catch all ranges. Any missing range is essentially defaulted to zero.

GIVE IT A TRY !!!

CAVEAT: Unfortunately, the first subquery requires you to hardcode it. It will be the only hardcoding required to make it work.

Related Solutions

Mysql – best practice for recurring calendar events

Separate the data from the processing. A database contains data; application code does the processing.

In your situation there are effectively two sets of data:

Raw data (especially for recurring events, such as "4pm every 3rd tues of the month until next June")
Calendar events, such as the the dozens of entries that the above item implies.

Consider starting with two tables, one for each of those. Then write the messy code that translates the definition of a recurring event into the individual entries. (Note: That is code, not SQL.)

Obviously, if there is the option to change a recurring event, there needs to be a way to make potentially massive changes to the second table. This implies an id for the recurring event that is stored in the second table. A FOREIGN KEY might be the way to go. (I prefer to do such myself.)

There are probably other tables, such as for 'normalizing' users, etc. (Do not normalize dates or datetimes.)

"What's scheduled for this afternoon?" That turns into a range query.

Hint: If something is ongoing (eg vacationing for 2 weeks), break it into multiple daily events (14, midnight to midnight) for the second table. Otherwise the SQL can get messy and inefficient.

How does all that feel?

Sql-server – SQL query help: Calculating days between observations

I think in this case, you best best since you are on SQL 2012 is to use Window Functions and specifically the LEAD function.

Here is an example to try and match your data.

Say I created a table with the columns you have here in the following way and then insert rows into that:

CREATE TABLE #AnimalBehavior
(
    ID int identity(1,1) primary key clustered
    ,AnimalID int
    ,DateID int
    ,Behavior1 int
    ,Behavior2 int
    ,Behavior3 int
);
GO


insert into #AnimalBehavior (AnimalID, DateID, Behavior1, Behavior2, Behavior3)
Values (1, 20150101, 0, 1, 1)
    ,(1, 20150201, 0, 1, 1)
    ,(1, 20150301, 0, 1, 1)
    ,(1, 20150401, 0, 1, 1)
    ,(2, 20150101, 0, 1, 1)
    ,(2, 20150301, 0, 1, 1)
    ,(2, 20150501, 0, 1, 1);
GO

From here I will want to only look at the times where your behavior3 is greater than 0 (so that we know it occurred during that period) and take the difference between the LEAD of the 2nd compared to the initial row, and the LEAD of the 3rd compared to the LEAD of the 2nd. You can do that in this way:

With ab
as
(
Select AnimalID
, DateID
, LEAD(DateID, 1, 0) over (Partition By AnimalID order by DateID) DateID_1
, Lead(DateID, 2, 0) over (Partition By AnimalID order by DateID) DateID_2
, row_number() over (Partition By AnimalID order by DateID) as rown
from #AnimalBehavior 
where Behavior3 > 0
)
Select AnimalID, Convert(date, Convert(varchar(8), DateID))
, DateDiff(dd,  Convert(date, Convert(varchar(8), DateID)),  Convert(date, Convert(varchar(8), DateID_1)))
, DateDiff(dd,  Convert(date, Convert(varchar(8), DateID_1)),  Convert(date, Convert(varchar(8), DateID_2)))
from AB
where rown = 1;

I believe this covers what you are looking for. But feel free to ask questions if not.