You have no need to add a "modules" field to your main/users table. Just JOIN
on the tables -- that's what Relational Databases are for. INNER
or OUTER
joins should be used depending on your needs.
For example, this will return a user with their conditions. If the user doesn't have any conditions, it will return NULL for those fields, but it will still return the user:
SELECT *
FROM User U
LEFT JOIN Conditions C ON U.Id = C.User_ID
Good visual representation of joins
OK, I ended up adding another lookup table:
CREATE TABLE IF NOT EXISTS `stops_routes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stop_id` varchar(100) NOT NULL,
`route_id` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `stop_route` (`stop_id`,`route_id`),
KEY `stop_id` (`stop_id`),
KEY `route_id` (`route_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Filling it was fairly fast:
mysql> insert into stops_routes (stop_id, route_id)
->
-> select
-> s.stop_id,
-> r.route_id as from_route_id
->
-> from routes r
-> left join trips t on t.route_id = r.route_id
-> left join stop_times st on st.trip_id = t.trip_id
-> left join stops s on s.stop_id = st.stop_id
-> group by s.stop_id, r.route_id;
Query OK, 3496 rows affected (8.38 sec)
Records: 3496 Duplicates: 0 Warnings: 0
Using it is blazingly fast:
mysql> select
-> r.route_id as from_route_id,
-> c_sr.route_id as to_route_id
->
-> from routes r
->
-> left join stops_routes sr on sr.route_id = r.route_id
-> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
-> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
->
-> where r.route_id <> c_sr.route_id
-> group by r.route_id, c_sr.route_id
-> limit 10;
+---------------+-------------+
| from_route_id | to_route_id |
+---------------+-------------+
| 0001 | 0002 |
| 0001 | 0003 |
| 0001 | 0004 |
| 0001 | 0005 |
| 0001 | 0006 |
| 0001 | 0008 |
| 0001 | 0009 |
| 0001 | 0011 |
| 0001 | 0014 |
| 0001 | 0031 |
+---------------+-------------+
10 rows in set (0.63 sec)
Now I can fill my last lookup table (set of connections between every routes on my GTFS network):
mysql> insert into route_connections (from_route_id, to_route_id)
-> select
-> r.route_id as from_route_id,
-> c_sr.route_id as to_route_id
->
-> from routes r
->
-> left join stops_routes sr on sr.route_id = r.route_id
-> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
-> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
->
-> where r.route_id <> c_sr.route_id
-> group by r.route_id, c_sr.route_id;
Query OK, 2848 rows affected (0.31 sec)
Records: 2848 Duplicates: 0 Warnings: 0
Amazingly fast. I guess the engine couldn't break up the steps to optimize this.
I'd still be interested to know if it would be possible to get the same result (from route to route connections table) using only one sub-second or sub-minute query.
Best Answer
This sounds like a job for a correlated subquery.
Your question said "productmarket" but the query you posted used "marketproduct" so I went with the latter in the example.
If the price table has no prices for a particular product, this query will return null for the price of that product.
The price table needs a multi-column index on (marketproduct_id, the_date_column) to optimize this query.
Update: You can get other columns, too, by repeating the subquery and asking for something different...
Or, you could try it with a join, moving the subquery to a join condition in the where clause to get as many columns as you need from the price table.
Both of these feel slightly messy, because in the first example, we're dipping into the price table twice, and in the second example, we're referencing the price table twice with two different aliases ... but it's hard to see a way to avoid that. The performance of the two queries should be fairly comparable, though. In both cases, having to go twice isn't as bad as it sounds because whichever element hits the table first will have already brought the index and row data into the buffer pool (InnoDB) or OS/key cache (MyISAM) so the penalty for the second hit on the same row shouldn't be significant.