OK, I ended up adding another lookup table:
CREATE TABLE IF NOT EXISTS `stops_routes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stop_id` varchar(100) NOT NULL,
`route_id` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `stop_route` (`stop_id`,`route_id`),
KEY `stop_id` (`stop_id`),
KEY `route_id` (`route_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Filling it was fairly fast:
mysql> insert into stops_routes (stop_id, route_id)
->
-> select
-> s.stop_id,
-> r.route_id as from_route_id
->
-> from routes r
-> left join trips t on t.route_id = r.route_id
-> left join stop_times st on st.trip_id = t.trip_id
-> left join stops s on s.stop_id = st.stop_id
-> group by s.stop_id, r.route_id;
Query OK, 3496 rows affected (8.38 sec)
Records: 3496 Duplicates: 0 Warnings: 0
Using it is blazingly fast:
mysql> select
-> r.route_id as from_route_id,
-> c_sr.route_id as to_route_id
->
-> from routes r
->
-> left join stops_routes sr on sr.route_id = r.route_id
-> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
-> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
->
-> where r.route_id <> c_sr.route_id
-> group by r.route_id, c_sr.route_id
-> limit 10;
+---------------+-------------+
| from_route_id | to_route_id |
+---------------+-------------+
| 0001 | 0002 |
| 0001 | 0003 |
| 0001 | 0004 |
| 0001 | 0005 |
| 0001 | 0006 |
| 0001 | 0008 |
| 0001 | 0009 |
| 0001 | 0011 |
| 0001 | 0014 |
| 0001 | 0031 |
+---------------+-------------+
10 rows in set (0.63 sec)
Now I can fill my last lookup table (set of connections between every routes on my GTFS network):
mysql> insert into route_connections (from_route_id, to_route_id)
-> select
-> r.route_id as from_route_id,
-> c_sr.route_id as to_route_id
->
-> from routes r
->
-> left join stops_routes sr on sr.route_id = r.route_id
-> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
-> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
->
-> where r.route_id <> c_sr.route_id
-> group by r.route_id, c_sr.route_id;
Query OK, 2848 rows affected (0.31 sec)
Records: 2848 Duplicates: 0 Warnings: 0
Amazingly fast. I guess the engine couldn't break up the steps to optimize this.
I'd still be interested to know if it would be possible to get the same result (from route to route connections table) using only one sub-second or sub-minute query.
Comments about that Tuner output:
Data in InnoDB tables: 575M (Tables: 194)
Total fragmented tables: 194
Run OPTIMIZE TABLE to defragment tables for better performance
Notice how all InnoDB tables are fragmented? That is the nature of how it is implemented. Do not bother running OPTIMIZE
; it will almost never provide substantive benefit.
Joins performed without indexes: 7320
Irrelevant if the tables are tiny. The important ones will show up in the slowlog even without log_queries_not_using_indexes
. (Leave that setting off; it only clutters the slowlog.)
Temporary tables created on disk: 38% (31K on disk / 84K total)
Slow queries: 1% (476/982K)
Let's see the worst ones. Please provide EXPLAIN
and SHOW CREATE TABLE
. I recommend long_query_time
of 2 or lower.
Don't increase any of the last 5 items from Tuner.
The rest of Tuner's comments are reasonable.
Next step
Get the output from pt-query-digest and let's see the first couple, plus EXPLAIN
and SHOW CREATE TABLE
.
Best Answer
As another answer highlighted, Query Cache is not the only cache. Its use is not even advisable in most cases - in fact, it was removed in MySQL 8.0. The reason is that it has scalability problems (it's governed by a global lock) and it invalidates data far too frequently to be useful in a normal workload.
But InnoDB buffer pool contains indexes and data accessed frequently. After your first query, some indexes and data are cached, so next time they will be read from memory. Probably some data/index pages are accessed only once per query, in which case you should be able to see a slighter difference between second and third execution (first time these pages are not cached).
How to avoid the difference between query execution times? Well, there is no way to make the query faster the very first time it runs, as it needs to read from disk. But then, if your buffer pool is big enough, your query will always be fast. Keep in mind that a big buffer pool is very important for MySQL performance. The general recommendation is to keep it 75-80% of total memory. But in reality, things are more complex:
Another thing is, sometimes every server is restarted. When it happens, if the buffer pool is simply emptied, your queries will be slower after restart - until hot data are cached again. But you can avoid this by setting:
In this way, your buffer pool will be (partially) written to disk on shutdown and reloaded at startup, so your queries should be fast even after a restart.