Mysql – Why max(id) with join not working or how to optimize it with join

greatest-n-per-groupjoin;MySQLmysql-8.0optimization

Below is my MESSAGES_THREAD table:

CREATE TABLE `messages_thread` (
  `id` varchar(64) NOT NULL,
  `creator` mediumint(9) NOT NULL,
  `other` mediumint(9) NOT NULL,
  `time` datetime DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `messages_thread_creator_index` (`creator`),
  KEY `messages_thread_other_index` (`other`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Below is my MESSAGES table:

CREATE TABLE `messages` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `userId` mediumint(8) unsigned NOT NULL,
  `threadId` varchar(64) NOT NULL,
  `status` tinyint(3) unsigned DEFAULT ''0'' COMMENT ''0->sent, 1->deliveried, 2->seen'',
  `time` datetime DEFAULT CURRENT_TIMESTAMP,
  `message` text NOT NULL,
  `data_url` varchar(512) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `messages_threadId_status_index` (`threadId`,`status`),
  KEY `index5` (`userId`),
  KEY `message_threadId_id_index` (`threadId`,`id` DESC),
  CONSTRAINT `messages_messages_thread_id_fk` FOREIGN KEY (`threadId`) REFERENCES `messages_thread` (`id`),
  CONSTRAINT `messages_user_userId_fk` FOREIGN KEY (`userId`) REFERENCES `user` (`userId`)
) ENGINE=InnoDB AUTO_INCREMENT=2522 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

I want to get the last message id from MESSAGES table according to its threadId and to doing this i tried below two Queries both result are ok but those are not so efficient. Second Query seems quite efficient but not up-to the mark I accepted. How can I write this Query efficiently?


Below is First QUERY:

SELECT max(m.id), mt.other 
FROM messages_thread mt 
join messages m ON mt.id = m.threadId 
WHERE mt.creator=1 
group by mt.other;

Below it's Graphical Execution Plan:

enter image description here

Below it's Session Status:

Handler_commit  1
Handler_delete  0
Handler_discover    0
Handler_external_lock   4
Handler_mrr_init    0
Handler_prepare 0
Handler_read_first  1
Handler_read_key    41
Handler_read_last   0
Handler_read_next   2561
Handler_read_prev   0
Handler_read_rnd    0
Handler_read_rnd_next   0
Handler_rollback    0
Handler_savepoint   0
Handler_savepoint_rollback  0
Handler_update  0
Handler_write   0

Below is Second Query (which seems little bit optimized):

SELECT m2.id, mt.other 
FROM messages_thread mt 
inner join ( select max(id) id, threadId 
           from messages m 
           group by threadId) as m2 on mt.id=m2.threadId
WHERE mt.creator=1;

Below it's Graphical Execution Plan:

enter image description here

Below it's Session Status:

Handler_commit  1
Handler_delete  0
Handler_discover    0
Handler_external_lock   4
Handler_mrr_init    0
Handler_prepare 0
Handler_read_first  1
Handler_read_key    122
Handler_read_last   1
Handler_read_next   0
Handler_read_prev   0
Handler_read_rnd    0
Handler_read_rnd_next   41
Handler_rollback    0
Handler_savepoint   0
Handler_savepoint_rollback  0
Handler_update  0
Handler_write   40

Query Execution details with LATERAL for suggested by member Madhur Bhaiya

Below is the Query suggested by Madhur Bhaiya LATERAL:

SELECT m2.id, mt.other 
FROM messages_thread mt 
join lateral (select max(m.id) as id
               from messages m 
               where m.threadId = mt.id ) as m2
WHERE mt.creator=1;

Below it's Graphical Execution Plan:

enter image description here

Below it's Session Status:

Handler_commit  1
Handler_delete  0
Handler_discover    0
Handler_external_lock   4
Handler_mrr_init    0
Handler_prepare 0
Handler_read_first  0
Handler_read_key    41
Handler_read_last   0
Handler_read_next   2561
Handler_read_prev   0
Handler_read_rnd    0
Handler_read_rnd_next   80
Handler_rollback    0
Handler_savepoint   0
Handler_savepoint_rollback  0
Handler_update  0
Handler_write   40

Below it's Explain Data:

1   PRIMARY mt      ref index5,messages_thread_creator_index,message_creator_other_index    message_creator_other_index 3   const   40  100.00  Using index; Rematerialize (<derived2>)
1   PRIMARY <derived2>      ALL                 2   100.00  
2   DEPENDENT DERIVED   m       ref messages_threadId_status_index,messages_threadId_index,messages_threadId_id_index   messages_threadId_status_index  258 tolet.mt.id 61  100.00  Using index

Best Answer

MySQL 8.0.14 has introduced Lateral Derived Table. In your second query, you are computing max() value for all the threadId, whether you need it or not during the JOIN. You can avoid this materialization of m2 (a costly process, which is basically temp table creation, either in-memory, or disk (if too big)), by using the following instead:

SELECT max(m2.id), mt.other 
FROM messages_thread mt 
join lateral ( select max(m.id) as id
               from messages m 
               where m.threadId = mt.id ) as m2
WHERE mt.creator=1 
group by other;

Also, try defining a new composite index on messages_thread table for Group By Optimization in the main SELECT query: (creator, other). You can benchmark this new query with and without new index suggestion, to decide whether to keep it or not.