I want to select a record from one table (customers
table) and join only with the last from other related table (sales
table). I know two ways of doing it, but I don't know which one have better performance. What is the best approach?
Note: in my case I'll get only one row because I want the data for one specific customer but what if I want all the main rows, does it change something?
Option 1, using sub-query in WHERE:
SELECT c.id, c.name, s.id, s.cost
FROM
customers as c
LEFT JOIN sales as s ON s.customer_id = c.id
WHERE
c.id = 42
AND s.id = (SELECT MAX(s2.id) FROM sales as s2 WHERE s2.customer_id = c.id) ;
Option 2, using sub-query in FROM:
SELECT c.id, c.name, s.id, s.cost
FROM
customers as c
LEFT JOIN (
SELECT s2.customer_id as cid, MAX(s2.id) as maxid
FROM sales as s2
WHERE s2.customer_id = 42
GROUP BY s2.customer_id
) maxids ON c.id = maxids.cid
LEFT JOIN sales as s ON s.id = maxids.maxid
WHERE
c.id = 42 ;
Here are the tables CREATE
:
CREATE TABLE `customers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(120) COLLATE utf8_spanish_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=22149 DEFAULT CHARSET=utf8 COLLATE=utf8_spanish_c
CREATE TABLE `sales` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`customer_id` int(11) DEFAULT NULL,
`status` tinyint(3) unsigned DEFAULT NULL,
`buydate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`cost` decimal(10,2) NOT NULL,
PRIMARY KEY (`id`),
KEY `customer_key` (`customer_id`) USING BTREE,
KEY `status_key` (`status`) USING BTREE,
CONSTRAINT `sales_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `customers` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=11555 DEFAULT CHARSET=utf8 COLLATE=utf8_spanish_ci
And here their EXPLAINS:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY c NULL const PRIMARY PRIMARY 4 const 1 100.00 NULL
1 PRIMARY s NULL const PRIMARY,customer_key PRIMARY 4 const 1 100.00 NULL
2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL NULL NULL Select tables optimized away
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY c NULL const PRIMARY PRIMARY 4 const 1 100.00 NULL
1 PRIMARY <derived2> NULL system NULL NULL NULL NULL 1 100.00 NULL
1 PRIMARY s NULL const PRIMARY PRIMARY 4 const 1 100.00 NULL
DERIVED s2 NULL ref customer_key customer_key 5 const 11 100.00
Best Answer
I sometimes find this technique useful for judging the performance of a query, or for comparing two possible queries. It is especially helpful before the table is large enough to get reliable timings.
That will give a few counters. If I see numbers that are about equal to the number of rows in a table I am selecting from, then I know there was a table scan. Numbers near the resultset size tend to be more beneficial.
Handler...write...
indicates that a temp table needed to be generated, and the number says how big it was.Even without understanding what the numbers represent, it is often 'obvious' which of two
SELECTs
is better.