Mysql – Why does this thesql query take so long

MySQLmysql-5.6query-performance

I have a MySQL server (ver 5.6) that is well equipped with hardware (Linux, in AWS, 32GB RAM, 56 CPUs), and according to top and iotop, it isn't even feeling warm; yet this query takes something like 2 hours (the actual query, not the explain):

mysql> explain
    -> SELECT
    -> DATE_FORMAT(DATE('2020-04-14'), '%m/%d/%Y') AS "Gaming Day",
    -> g.name AS "Game name",
    -> u.username AS "User Id",
    -> ga.game_instance_id AS "Game Round Id",
    -> gt.user_transaction_id AS "Transaction Id",
    -> ga.type AS "Transaction Type",
    -> ga.amount AS "Transaction Amount",
    -> CONVERT_TZ(ga.created_timestamp, 'UTC', 'SYSTEM') AS "Transaction Date Time (EST)"
    -> FROM spin.game_action ga
    -> INNER JOIN spin.game_instance gi
    -> ON gi.game_instance_id = ga.game_instance_id
    -> INNER JOIN spin.game_transaction gt
    -> ON gt.game_action_id = ga.game_action_id
    -> INNER JOIN spin.user u
    -> ON ga.user_id = u.user_id
    -> INNER JOIN spin.organisation_site os
    -> ON u.organisation_site_id = os.organisation_site_id
    -> INNER JOIN spin.game g
    -> ON g.game_id = ga.game_id
    -> WHERE os.hostname = 'nyx'
    -> AND gi.end_datetime BETWEEN CONVERT_TZ('2020-04-14 00:00:00', 'SYSTEM', 'UTC') AND CONVERT_TZ('2020-04-21 23:59:59', 'SYSTEM', 'UTC')
    -> AND gi.status IN ('RESOLVED', 'AUTO_COMPLETED');
+----+-------------+-------+--------+---------------------------------+---------+---------+-----------------------------+------+-------------+
| id | select_type | table | type   | possible_keys                   | key     | key_len | ref                         | rows | Extra       |
+----+-------------+-------+--------+---------------------------------+---------+---------+-----------------------------+------+-------------+
|  1 | SIMPLE      | g     | index  | PRIMARY                         | BG_UK1  | 202     | NULL                        |   60 | Using index |
|  1 | SIMPLE      | ga    | ref    | PRIMARY,GA_IX01,GA_IX02,GA_IX03 | GA_IX01 | 4       | spin.g.game_id              |  674 | Using where |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY,U_UK01,U_IX_04          | PRIMARY | 4       | spin.ga.user_id             |    1 | NULL        |
|  1 | SIMPLE      | os    | eq_ref | PRIMARY                         | PRIMARY | 4       | spin.u.organisation_site_id |    1 | Using where |
|  1 | SIMPLE      | gi    | ref    | PRIMARY                         | PRIMARY | 8       | spin.ga.game_instance_id    |    1 | Using where |
|  1 | SIMPLE      | gt    | ref    | GT_IX03                         | GT_IX03 | 9       | spin.ga.game_action_id      |    1 | Using index |
+----+-------------+-------+--------+---------------------------------+---------+---------+-----------------------------+------+-------------+
6 rows in set (0.01 sec)

I tried running it under profiling, but that is basically useless:

mysql> show profile for query 2;
+----------------------+------------+
| Status               | Duration   |
+----------------------+------------+
| starting             |   0.000160 |
| checking permissions |   0.000005 |
| checking permissions |   0.000002 |
| checking permissions |   0.000003 |
| checking permissions |   0.000003 |
| checking permissions |   0.000003 |
| checking permissions |   0.000005 |
| Opening tables       |   0.000067 |
| init                 |   0.000133 |
| System lock          |   0.000201 |
| optimizing           |   0.000049 |
| statistics           |   0.000416 |
| preparing            |   0.000050 |
| executing            |   0.000005 |
| Sending data         | 999.999999 |
| end                  |   0.000010 |
| query end            |   0.000008 |
| closing tables       |   0.010543 |
| freeing items        |   0.000062 |
| logging slow query   |   0.000002 |
| logging slow query   |   0.110968 |
| cleaning up          |   0.003693 |
+----------------------+------------+
22 rows in set, 1 warning (0.01 sec)

As I understand it, Sending data covers both the actual transmission of results to the client and a major part of the processing. I am now considering how to analyse this further with the performance_schema; it is likely to reveal more useful details?

Edit

Output of SHOW GLOBAL STATUS\G: https://pastebin.com/QejRk9RA

Output of SHOW GLOBAL VARIABLES\G: https://pastebin.com/b3B76v21

Output of SHOW FULL PROCESSLIST\G: https://pastebin.com/N85YUwFj

MySQLtuner report: https://pastebin.com/HJJsCCzL

Edit 2

$ ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        8192
coredump(blocks)     0
memory(kbytes)       unlimited
locked memory(kbytes) 64
process              128223
nofiles              1024
vmemory(kbytes)      unlimited
locks                unlimited
rtprio               0
db3 root = iostat -xm 5 3
Linux 4.9.0-11-amd64 (db3.spin-production.gamingrealms.org)     05/27/2020      _x86_64_        (56 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.72    0.00    0.07    0.04    0.00   99.17

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.11    11.30   20.83   17.03     3.66     0.66   233.37     0.05    1.24    1.41    1.04   0.51   1.94
dm-0              0.00     0.00    0.04    0.04     0.00     0.00     8.13     0.01   99.38    3.62  184.79   1.05   0.01
dm-1              0.00     0.00   20.88   27.29     3.66     0.66   183.42     0.05    1.05    1.42    0.76   0.40   1.94

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.60    0.00    0.16    0.08    0.00   96.16

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    18.60  121.40   13.80    30.00     0.27   458.59     0.13    0.93    0.92    1.10   0.49   6.64
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00  121.40   32.40    30.00     0.27   403.13     0.13    0.82    0.91    0.47   0.43   6.64

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.59    0.00    0.16    0.03    0.00   98.23

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    15.00   49.00   17.60    12.02     0.21   375.95     0.04    0.64    0.87    0.00   0.37   2.48
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00   49.00   32.60    12.02     0.21   306.84     0.04    0.52    0.87    0.00   0.30   2.48
db3 root = top -c
top - 04:36:25 up 167 days, 21:28,  1 user,  load average: 1.97, 2.28, 2.18
Tasks: 584 total,   2 running, 582 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.4 us,  0.2 sy,  0.0 ni, 96.4 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32848684 total,  5421432 free, 19970372 used,  7456880 buff/cache
KiB Swap: 66916348 total, 66844172 free,    72176 used. 12286984 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
43108 root      20   0    4440   1312   1044 R  89.2  0.0 243:45.28 /bin/gzip -
18933 root      20   0    4440   1332   1048 S  61.0  0.0 107:19.61 /bin/gzip -
43107 root      20   0  249232  40072  11788 S   9.8  0.1  24:21.13 /usr/bin/innobackupex --stream=tar --user=backupuser --password=x xxxxxx --safe-slave-backup --slave-info --databa+
18932 root      20   0  249232  40024  11740 S   7.2  0.1  10:17.08 /usr/bin/innobackupex --stream=tar --user=backupuser --password=x xxxxxx --safe-slave-backup --slave-info --databa+
18934 root      20   0 1058508 147260   9648 S   4.3  0.4   6:46.18 /usr/bin/python3 /usr/bin/aws s3 cp - s3://spin.db.backup/2020-05-27-spin.tar.gz --region us-east-1
43109 root      20   0 1069768 174488   9404 S   3.6  0.5  14:16.26 /usr/bin/python3 /usr/bin/aws s3 cp - s3://spin.db.backup/2020-05-26-spin.tar.gz --region us-east-1
54915 root      20   0   45472   4396   3240 R   1.6  0.0   0:00.29 top -c
39686 mysql     20   0 21.178g 0.018t   8980 S   1.3 58.2   2979:59 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-err+
41601 root      20   0   33196  11696   5100 S   0.7  0.0  70:42.86 /usr/share/filebeat/bin/filebeat -c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/fi+
  312 root      20   0       0      0      0 S   0.3  0.0   0:36.25 [ksoftirqd/50]
 1661 sensu     20   0 1594004  28004   5676 S   0.3  0.1 127:00.25 /opt/sensu/embedded/bin/ruby /opt/sensu/bin/sensu-client -c /etc/sensu/config.json -d /etc/sensu/conf.d -e /etc/se+
11120 root      20   0  109512    684    660 S   0.3  0.0 132:39.67 /var/ossec/bin/wazuh-modulesd
24552 sensu     20   0  135144  24696      0 S   0.3  0.1 217:07.55 /usr/sbin/sensu-agent start
    1 root      20   0   57396   5144   3668 S   0.0  0.0  11:17.41 /sbin/init
...

I have run this query both when ther server seems under no load (according to top and show processlist) and when some heavy SQL jobs push it up to about 1000% CPU, and it seems not to make a lot of difference, believe it or not. The ~2 hours is what it takes with nothing going on.

Edit3

SHOW GLOBAL STATUS: https://pastebin.com/5PeBEkz7

SHOW GLOBAL VARIABLES: https://pastebin.com/SnGS28rD

SHOW FULL PROCESSLIST: https://pastebin.com/AR2WZbnM

Edit4

db3 root = hdparm -I /dev/sda

/dev/sda:
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0d 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ATA device, with non-removable media
Standards:
        Likely used: 1
Configuration:
        Logical         max     current
        cylinders       0       0
        heads           0       0
        sectors/track   0       0
        --
        Logical/Physical Sector size:           512 bytes
        device size with M = 1024*1024:           0 MBytes
        device size with M = 1000*1000:           0 MBytes
        cache/buffer size  = unknown
Capabilities:
        IORDY not likely
        Cannot perform double-word IO
        R/W multiple sector transfer: not supported
        DMA: not supported
        PIO: pio0

Edit

show create table and show table status: https://pastebin.com/s9Y61GTb

Best Answer

Your ulimit Open Files limit of 1024 is starving MySQL for handles. From OS command prompt, ulimit -n 36000 and press Enter will dynamically make 36000 handles available - on the next stop/start of MySQL.

To make 36000 handles available over OS restart, follow this URL

 https://glassonionblog.wordpress.com/2013/01/27/increase-ulimit-and-file-descriptors-limit/

DO NOT use 500,000 as in their example, use 36000 to enable MySQL to use up to 20000 - your open_files_limit requested.

Rate Per Second = RPS

Suggestions to consider for your AWS Parameters group

innodb_io_capacity=1900  # from 200 - you will likely need to work with AWS to get here
innodb_buffer_pool_size=16G  # from ~5 G to reduce innodb_buffer_pool_reads RPS of 36
innodb_buffer_pool_instances=8  # from 1 to reduce mutex contention
read_rnd_buffer_size=128K  # from 256K to reduce handler_read_rnd_next RPS of 15,887
thread_cache_size=16  # from 8 to reduce threads created
innodb_open_files=512  # from 300 to match table_open_cache

Applying these changes will reduce CPU busy. Your version 5.5.62-38.14 is 18 months past End of Life. New versions have better performance and additional Global Variables to improve and better manage performance. View my profile, Network profile for contact info and free downloadable Utility Scripts to assist with performance tuning.

It looks like your slow query is missing gi combination index on end_datetime and status. If you can move the TWO AND's up about 8 rows to make the JOIN selective, it may help.