SUGGESTED QUERY
SELECT
CONCAT(A.ndx,' - ',A.ndx+9) "score range",
IFNULL(B.rowcount ,0) "number of occurrences"
FROM
(
SELECT 0 ndx UNION SELECT 10 UNION SELECT 20 UNION
SELECT 30 UNION SELECT 40 UNION SELECT 50 UNION SELECT 60
UNION SELECT 70 UNION SELECT 80 UNION SELECT 90
) A
LEFT JOIN
(
SELECT ndx,COUNT(1) rowcount FROM
(SELECT FLOOR(score/10)*10 ndx FROM yoshi_scores) AA
GROUP BY ndx
) B USING (ndx);
SAMPLE DATA
use test
DROP TABLE IF EXISTS yoshi_scores;
CREATE TABLE yoshi_scores
(id int not null auto_increment,
score int not null,primary key (id), key (score));
INSERT INTO yoshi_scores (score) VALUES
(97),(74),(22),(98),(65),(62),(47),(64),(82),( 8),(60),(12),
(27),(14),(13),(28),(60),(12),(27),(34),(32),(89),(15),( 2);
Data would look like this
mysql> SELECT * FROM yoshi_scores ORDER BY id;
+----+-------+
| id | score |
+----+-------+
| 1 | 97 |
| 2 | 74 |
| 3 | 22 |
| 4 | 98 |
| 5 | 65 |
| 6 | 62 |
| 7 | 47 |
| 8 | 64 |
| 9 | 82 |
| 10 | 8 |
| 11 | 60 |
| 12 | 12 |
| 13 | 27 |
| 14 | 14 |
| 15 | 13 |
| 16 | 28 |
| 17 | 60 |
| 18 | 12 |
| 19 | 27 |
| 20 | 34 |
| 21 | 32 |
| 22 | 89 |
| 23 | 15 |
| 24 | 2 |
+----+-------+
24 rows in set (0.00 sec)
SUGGESTED QUERY EXECUTED
mysql> SELECT
-> CONCAT(A.ndx,' - ',A.ndx+9) "score range",
-> IFNULL(B.rowcount ,0) "number of occurrences"
-> FROM
-> (
-> SELECT 0 ndx UNION SELECT 10 UNION SELECT 20 UNION
-> SELECT 30 UNION SELECT 40 UNION SELECT 50 UNION SELECT 60
-> UNION SELECT 70 UNION SELECT 80 UNION SELECT 90
-> ) A
-> LEFT JOIN
-> (
-> SELECT ndx,COUNT(1) rowcount FROM
-> (SELECT FLOOR(score/10)*10 ndx FROM yoshi_scores) AA
-> GROUP BY ndx
-> ) B USING (ndx);
+-------------+-----------------------+
| score range | number of occurrences |
+-------------+-----------------------+
| 0 - 9 | 2 |
| 10 - 19 | 5 |
| 20 - 29 | 4 |
| 30 - 39 | 2 |
| 40 - 49 | 1 |
| 50 - 59 | 0 |
| 60 - 69 | 5 |
| 70 - 79 | 1 |
| 80 - 89 | 2 |
| 90 - 99 | 2 |
+-------------+-----------------------+
10 rows in set (0.00 sec)
mysql>
Why is the Query designed this way ??? Look at the first subquery
(
SELECT 0 ndx UNION SELECT 10 UNION SELECT 20 UNION
SELECT 30 UNION SELECT 40 UNION SELECT 50 UNION SELECT 60
UNION SELECT 70 UNION SELECT 80 UNION SELECT 90
) A
I performed a LEFT JOIN
of this to the counts for a reason. Notice that the dataset has nothing in the 50 - 59
range. Its count would not show up in the second query:
mysql> SELECT ndx,COUNT(1) rowcount FROM
-> (SELECT FLOOR(score/10)*10 ndx FROM yoshi_scores) AA
-> GROUP BY ndx;
+------+----------+
| ndx | rowcount |
+------+----------+
| 0 | 2 |
| 10 | 5 |
| 20 | 4 |
| 30 | 2 |
| 40 | 1 |
| 60 | 5 |
| 70 | 1 |
| 80 | 2 |
| 90 | 2 |
+------+----------+
9 rows in set (0.00 sec)
mysql>
I am sure you wanted the range 50 - 59
to show up, so the query is designd to catch all ranges. Any missing range is essentially defaulted to zero.
GIVE IT A TRY !!!
CAVEAT: Unfortunately, the first subquery requires you to hardcode it. It will be the only hardcoding required to make it work.
It's hard to tell if this will hurt or improve performance but adding a composite index on start_date
and end_date
shouldn't worsen your query, the index will either be used or not used. You didn't mention your RDBMS but I don't think there's a big chance your engine's optimizer will pick a worse plan (but there is always a possibility I guess).
That being said, there might a slight impact on the plan generation time as your optimizer has an extra index to consider, and unused indexes could hurt overall performance since they need to be maintained when updating/inserting.
Whether or not the index will help your query is hard to tell, it depends on the selectivity of your query and cardinality statistics. The optimizer of your RDBMS should pick the fastest way to get your data, and even if an index exists on the columns your query selects on a full scan might still be the better option, especially since you are selecting all fields.
Why is selectivity important?
When an index is used, the index points to the actual record and the record might need to be fetched using the pointer. Most RDBMS work that way.
See for example the Oracle documentation
In general, index access paths should be used for statements that
retrieve a small subset of table rows, while full scans are more
efficient when accessing a large portion of the table
or the SQL Server documentation
In many cases, the optimizer forces a table scan for queries with
result sets of approximately 5 percent, although the table scan
becomes more efficient than index access at selectivities of 8 to 10
percent.
Why is select * important
If you fetch all fields from a table your database engine will have to fetch the actual record using the pointer stored in the index and you lose the option of using included columns
(if your RDBMS supports them) or covering indexes
.
See for example this article for SQL server (but the principle goes for a lot of vendors): Using Covering Indexes to Improve Query Performance
However, under some circumstances, the overhead associated with
nonclustered indexes may be deemed too great by the query optimizer
and SQL Server will resort to a table scan to resolve the query.
and
The observed improvement is due to the fact that the nonclustered index
contained all of the required information to resolve the query. No Key
Lookups were required.
Best Answer
I hope I understand this right and you can use this for at least a start.
I haven't quite gotten on how you'd like to specify the dates and intervals though. Maybe you can clarify that, than I can probably edit this in.