It takes around 6 seconds for the following query to execute on a MacBook Air with 1,8 GHz Dual-Core Intel Core and 8 GB 1600 MHz DDR3, running mysql Ver 14.14 Distrib 5.7.27.
I would like to know if that is to be expected given all of the parameters or could it be optimised somehow.
SELECT DISTINCT `sessions`.`strain_id_id`,
`strains`.`product_cbd_units`,
`strains`.`product_thc_units`,
AVG(`session_symptoms`.`total_efficacy`) AS `efficacy`,
COUNT(`sessions`.`strain_id_id`) AS `uses`,
COUNT(DISTINCT `sessions`.`user_id_id`) AS `users`
FROM `session_symptoms`
INNER JOIN `sessions`
ON (`session_symptoms`.`session_id_id` = `sessions`.`id`)
INNER JOIN `strains`
ON (`sessions`.`strain_id_id` = `strains`.`id`)
INNER JOIN `product_province`
ON (`strains`.`id` = `product_province`.`strain_id`)
WHERE (`session_symptoms`.`symptom_id_id` = 1
AND `strains`.`archived` = 0
AND `strains`.`is_visible` = 1
AND `strains`.`trust_indicator` = '0'
AND `product_province`.`country_of_sale` = 'CA'
)
GROUP BY `sessions`.`strain_id_id`, `strains`.`product_cbd_units`,
`strains`.`product_thc_units`
HAVING (AVG(`session_symptoms`.`total_efficacy`) > '0'
AND COUNT(DISTINCT `sessions`.`user_id_id`) >= 10
)
ORDER BY `efficacy` DESC
LIMIT 10
The explain seems OK to me since the product_province
table is the only one (nearly) completely scanned (I've tried adding an index on the country_of_sale
field but did not see any improvements):
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: product_province
partitions: NULL
type: ALL
possible_keys: product_province_strain_id_28314e95_fk_strains_id
key: NULL
key_len: NULL
ref: NULL
rows: 2784
filtered: 10.00
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: strains
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: strainprint.product_province.strain_id
rows: 1
filtered: 5.00
Extra: Using where
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: sessions
partitions: NULL
type: ref
possible_keys: PRIMARY,sessions_strain_id_id_524ebbd3_fk_strains_id
key: sessions_strain_id_id_524ebbd3_fk_strains_id
key_len: 4
ref: strainprint.product_province.strain_id
rows: 63
filtered: 100.00
Extra: NULL
*************************** 4. row ***************************
id: 1
select_type: SIMPLE
table: session_symptoms
partitions: NULL
type: ref
possible_keys: session_symptoms_session_id_id_57e72712_fk_sessions_id,session_symptoms_symptom_id_id_af0cc421_fk_symptoms_id
key: session_symptoms_session_id_id_57e72712_fk_sessions_id
key_len: 4
ref: strainprint.sessions.id
rows: 1
filtered: 15.68
Extra: Using where
4 rows in set, 1 warning (0.00 sec)
The session_symptoms
table has 1407725 rows.
The sessions
table has 865298 rows.
The strains
table has 26130 rows.
The product_province
table has 2831 rows.
I have been looking at the Innodb_buffer_pool_read_requests
and Innodb_buffer_pool_reads
variables and these are the values before and after the query runs:
(before running the query)
Innodb_buffer_pool_read_requests 548617058
Innodb_buffer_pool_reads 20782
(after the query was run)
Innodb_buffer_pool_read_requests 555160863
Innodb_buffer_pool_reads 20782
If my understanding is correct, this means that the innodb was able to serve the query results from the in-memory pages. The buffer pool is set to 3GB and is 0.33GB full. Here is the innodb status output:
*************************** 1. row ***************************
Type: InnoDB
Name:
Status:
=====================================
2020-02-17 14:44:00 0x7f8685fb9700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 57 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 81 srv_active, 0 srv_shutdown, 33534 srv_idle
srv_master_thread log flush and writes: 33615
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1874
OS WAIT ARRAY INFO: signal count 8043
RW-shared spins 0, rounds 845, OS waits 120
RW-excl spins 0, rounds 2322, OS waits 1211
RW-sx spins 1, rounds 30, OS waits 0
Spin rounds per wait: 845.00 RW-shared, 2322.00 RW-excl, 30.00 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 4257212
Purge done for trx's n:o < 4256947 undo n:o < 0 state: running but idle
History list length 23
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421694729191056, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421694729190136, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421694729189216, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
21600 OS file reads, 759 OS file writes, 336 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 186, seg size 188, 7 merges
merged operations:
insert 8, delete mark 0, delete 0
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 796967, node heap has 5307 buffer(s)
Hash table size 796967, node heap has 1278 buffer(s)
Hash table size 796967, node heap has 4 buffer(s)
Hash table size 796967, node heap has 7 buffer(s)
Hash table size 796967, node heap has 3 buffer(s)
Hash table size 796967, node heap has 2 buffer(s)
Hash table size 796967, node heap has 6 buffer(s)
Hash table size 796967, node heap has 1270 buffer(s)
37839.69 hash searches/s, 300.36 non-hash searches/s
---
LOG
---
Log sequence number 15773227988
Log flushed up to 15773227988
Pages flushed up to 15773227988
Last checkpoint at 15773227979
0 pending log flushes, 0 pending chkp writes
131 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 3298295808
Dictionary memory allocated 2870340
Buffer pool size 196608
Free buffers 167346
Database pages 21385
Old database pages 8040
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 21292, created 93, written 500
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 21385, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
This line:
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
should also mean that the buffer pool was fully utilised for the query, right? If that is the case, is it normal for the query to take around 6 seconds given that it did not require reading the pages from the disk?
UPDATE: I've updated the indexes as per Rick James' answer:
ALTER TABLE `sessions` ADD INDEX strain_user (`strain_id_id`, user_id_id)
ALTER TABLE `session_symptoms` ADD INDEX sym_ses_eff (`symptom_id_id`, session_id_id, total_efficacy)
ALTER TABLE `session_symptoms` ADD INDEX ses_sym_eff (session_id_id, `symptom_id_id`, total_efficacy)
ALTER TABLE `strains` ADD INDEX lab_vis_arch_id_thc_cbd (trust_indicator, `is_visible`, archived,
id, product_thc_units, product_cbd_units)
ALTER TABLE `product_province` ADD INDEX country_strain (`country_of_sale`, strain_id)
ALTER TABLE `product_province` ADD INDEX strain_country (strain_id, country_of_sale)
and the query now takes less than 2.5 seconds. The explain now looks like:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: product_province
partitions: NULL
type: ref
possible_keys: product_province_strain_id_28314e95_fk_strains_id,country_strain,strain_country
key: country_strain
key_len: 4
ref: const
rows: 2386
filtered: 100.00
Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: strains
partitions: NULL
type: eq_ref
possible_keys: PRIMARY,lab_vis_arch_id_thc_cbd
key: PRIMARY
key_len: 4
ref: strainprint.product_province.strain_id
rows: 1
filtered: 8.74
Extra: Using where
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: sessions
partitions: NULL
type: ref
possible_keys: PRIMARY,sessions_strain_id_id_524ebbd3_fk_strains_id,strain_user
key: strain_user
key_len: 4
ref: strainprint.product_province.strain_id
rows: 40
filtered: 100.00
Extra: Using index
*************************** 4. row ***************************
id: 1
select_type: SIMPLE
table: session_symptoms
partitions: NULL
type: ref
possible_keys: session_symptoms_session_id_id_57e72712_fk_sessions_id,session_symptoms_symptom_id_id_af0cc421_fk_symptoms_id,sym_ses_eff,ses_sym_eff
key: sym_ses_eff
key_len: 8
ref: const,strainprint.sessions.id
rows: 1
filtered: 100.00
Extra: Using index
4 rows in set, 1 warning (0.01 sec)
Best Answer
SELECT DISTINCT
andGROUP BY
do similar things; don't specify both. Keep theGROUP BY
.Suggested indexes. Most are "covering". Some assume one ordering of the tables in the
JOIN
; some are optimized for the other ordering; the Optimizer can pick which is better.20782 / 555160863 is a very low ratio, thereby indicating that
innodb_buffer_pool_size
is big enough. But there were 7M "read requests", possibly indicating the need for a lot of data. (Caveat: These numbers are across all connections.)"10 possible additional filters" -- That could lead to 10 separate optimizations, and possibly 40 more indexes needed! Even if we speed up this version, it may not help the other versions.