Mysql – Query an intermediate table and join to child table

MySQL

Here's my database structure:

Product
Market
ProductMarket (linked to Product and Market with "product_id" and "market_id"
Price (linked to ProductMarket with "productmarket_id")

I want to query the ProductMarket table and filter by a particular product. Then, I want to associate the latest price (from the Price table) to the product (for each market), based on a date field in the Price table.

Right now, I have that:

SELECT
    mp.id AS id,
    p.name AS productName, 
    m.name AS marketName, 
    mp.identifier
FROM
    marketproduct mp
INNER JOIN
    market m ON mp.market_id = m.id
INNER JOIN
    product p ON mp.product_id = p.id
WHERE
    productName = 'SomeProduct'

I'm having difficulty integrating the Price table in. How can I achieve this?

Best Answer

This sounds like a job for a correlated subquery.

SELECT
    mp.id AS id,
    p.name AS productName, 
    m.name AS marketName, 
    mp.identifier,
    (SELECT the_price_column 
       FROM price pr
      WHERE pr.marketproduct_id = mp.id  
      ORDER by the_date_column DESC LIMIT 1) as latest_price
FROM
    marketproduct mp
INNER JOIN
    market m ON mp.market_id = m.id
INNER JOIN
    product p ON mp.product_id = p.id
WHERE
    productName = 'SomeProduct'

Your question said "productmarket" but the query you posted used "marketproduct" so I went with the latter in the example.

If the price table has no prices for a particular product, this query will return null for the price of that product.

The price table needs a multi-column index on (marketproduct_id, the_date_column) to optimize this query.

Update: You can get other columns, too, by repeating the subquery and asking for something different...

...
    m.name AS marketName, 
    mp.identifier,
    (SELECT the_price_column 
       FROM price pr
      WHERE pr.marketproduct_id = mp.id  
      ORDER by createddate DESC LIMIT 1) as latest_price,
    (SELECT createddate 
       FROM price pr
      WHERE pr.marketproduct_id = mp.id  
      ORDER by createddate DESC LIMIT 1) as latest_date
FROM
    marketproduct mp
...

Or, you could try it with a join, moving the subquery to a join condition in the where clause to get as many columns as you need from the price table.

SELECT
    mp.id AS id,
    p.name AS productName, 
    m.name AS marketName, 
    mp.identifier,
    pr.the_price_column,
    pr.createddate
FROM
    marketproduct mp
INNER JOIN
    market m ON mp.market_id = m.id
INNER JOIN
    product p ON mp.product_id = p.id
INNER JOIN
    price pr ON pr.id = 
    (SELECT id 
       FROM price prx
      WHERE prx.marketproduct_id = mp.id  
      ORDER by createddate DESC LIMIT 1)
WHERE
    productName = 'SomeProduct';

Both of these feel slightly messy, because in the first example, we're dipping into the price table twice, and in the second example, we're referencing the price table twice with two different aliases ... but it's hard to see a way to avoid that. The performance of the two queries should be fairly comparable, though. In both cases, having to go twice isn't as bad as it sounds because whichever element hits the table first will have already brought the index and row data into the buffer pool (InnoDB) or OS/key cache (MyISAM) so the penalty for the second hit on the same row shouldn't be significant.

Related Solutions

Mysql – Best way to set up database tables

You have no need to add a "modules" field to your main/users table. Just JOIN on the tables -- that's what Relational Databases are for. INNER or OUTER joins should be used depending on your needs.

For example, this will return a user with their conditions. If the user doesn't have any conditions, it will return NULL for those fields, but it will still return the user:

SELECT * 
FROM User U
   LEFT JOIN Conditions C ON U.Id = C.User_ID

Good visual representation of joins

MySQL Performance – How to Optimize Slow Query with Joins and GROUP BY

OK, I ended up adding another lookup table:

CREATE TABLE IF NOT EXISTS `stops_routes` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `stop_id` varchar(100) NOT NULL,
  `route_id` varchar(100) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `stop_route` (`stop_id`,`route_id`),
  KEY `stop_id` (`stop_id`),
  KEY `route_id` (`route_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;

Filling it was fairly fast:

mysql> insert into stops_routes (stop_id, route_id)
    ->
    -> select
    ->     s.stop_id,
    ->     r.route_id as from_route_id
    ->
    -> from routes r
    -> left join trips t on t.route_id = r.route_id
    -> left join stop_times st on st.trip_id = t.trip_id
    -> left join stops s on s.stop_id = st.stop_id
    -> group by s.stop_id, r.route_id;
Query OK, 3496 rows affected (8.38 sec)
Records: 3496  Duplicates: 0  Warnings: 0

Using it is blazingly fast:

mysql> select
    ->     r.route_id as from_route_id,
    ->     c_sr.route_id as to_route_id
    ->
    -> from routes r
    ->
    -> left join stops_routes sr on sr.route_id = r.route_id
    -> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
    -> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
    ->
    -> where r.route_id <> c_sr.route_id
    -> group by r.route_id, c_sr.route_id
    -> limit 10;
+---------------+-------------+
| from_route_id | to_route_id |
+---------------+-------------+
| 0001          | 0002        |
| 0001          | 0003        |
| 0001          | 0004        |
| 0001          | 0005        |
| 0001          | 0006        |
| 0001          | 0008        |
| 0001          | 0009        |
| 0001          | 0011        |
| 0001          | 0014        |
| 0001          | 0031        |
+---------------+-------------+
10 rows in set (0.63 sec)

Now I can fill my last lookup table (set of connections between every routes on my GTFS network):

mysql> insert into route_connections (from_route_id, to_route_id)
    -> select
    ->     r.route_id as from_route_id,
    ->     c_sr.route_id as to_route_id
    ->
    -> from routes r
    ->
    -> left join stops_routes sr on sr.route_id = r.route_id
    -> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
    -> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
    ->
    -> where r.route_id <> c_sr.route_id
    -> group by r.route_id, c_sr.route_id;
Query OK, 2848 rows affected (0.31 sec)
Records: 2848  Duplicates: 0  Warnings: 0

Amazingly fast. I guess the engine couldn't break up the steps to optimize this.
I'd still be interested to know if it would be possible to get the same result (from route to route connections table) using only one sub-second or sub-minute query.

Best Answer

Related Solutions

Mysql – Best way to set up database tables

MySQL Performance – How to Optimize Slow Query with Joins and GROUP BY

Related Question