Mysql – Slow fetch time for MySQL query with large result set

innodbMySQLmysql-5.7

I'm aware I should expect slower fetch times for large result sets generally, but I don't understand why fetching should be this slow. The workload fetches a large number of rows (1-10M) from a large table (100M+ rows).

mysql> describe testdb.test_table;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| col1  | int(11) | NO   | PRI | NULL    |       |
| col2  | int(11) | NO   | PRI | NULL    |       |
| col3  | int(11) | NO   | PRI | NULL    |       |
+-------+---------+------+-----+---------+-------+

For the test case I use as simple query as possible:

select * from test_table limit 1000000;

However, the client appears to bottleneck around 1.5-2.5M rows/sec per query (1.5M in a Python client, 2.5M in MySQL Workbench). I know 2.5M rows/sec seems pretty fast, but that only works out to 30 MB/sec (2.5M * 3 cols * 4-byte ints). I'm on macOS 10.15.4, MySQL 5.7.29 installed via Homebrew, and connecting to the server over localhost. Python is using the MySQLClient db driver.

The query plan shows the results selected from the primary index, as expected:

+----+-------------+----------+------------+-------+---------------+---------+---------+------+----------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys | key     | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+----------+------------+-------+---------------+---------+---------+------+----------+----------+-------------+
|  1 | SIMPLE      | my_table | NULL       | index | NULL          | PRIMARY | 12      | NULL | 10821795 |   100.00 | Using index |
+----+-------------+----------+------------+-------+---------------+---------+---------+------+----------+----------+-------------+

The query profile doesn't indicate any hot spots.

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000055 |
| checking permissions | 0.000008 |
| Opening tables       | 0.000016 |
| init                 | 0.000015 |
| System lock          | 0.000021 |
| optimizing           | 0.000011 |
| statistics           | 0.000012 |
| preparing            | 0.000019 |
| explaining           | 0.000028 |
| end                  | 0.000007 |
| query end            | 0.000007 |
| closing tables       | 0.000009 |
| freeing items        | 0.000015 |
| cleaning up          | 0.000023 |
+----------------------+----------+

And as best I can tell, the entire test db fits in the buffer pool, and no disk IO occurs during the query. The Innodb_buffer_pool_reads value is unchanged after executing the query, and the InnoDB Status metrics from MySQL Workbench's performance dashboard are all zero throughout its runtime.

In MySQL Workbench's Duration / Fetch Time columns, the duration stays consistently under 1ms, regardless of the number of rows selected. However, the fetch time is proportional to rows returned: ~0.5 sec for 1M and and 5.0 sec for 10M rows.

When I observe processes with top I can see MySQL spiking to 100% CPU for a short time followed by MySQLWorkbench spiking to 100% ~~for the remaining duration of the query~~ after the query completes. The same test with the Python client (that doesn't do any additional work) shows the time a little more evenly split, but it's hard to measure.

That seems to only leave the the db client driver or the network connection itself as the bottleneck. I assume it's not the network since I'm testing over localhost (though I have not tested localhost in isolation). Does it make sense that the client bottlenecks processing rows at 30MB/sec? Can anything be done to improve throughput?

Update

Including requested global status, variables, processlist, and innodb status, but note that this is not on a dedicated server. My tests are on a MacBook Pro with 16GB RAM, 4-cores with Hyperthreading (i.e. macOS sees 8 hardware threads). The hard drive is an NVMe (~232k Read IOPS RND4k@QD32), but as I noted above, I observe no disk IO (and that includes swapping/paging by the OS).

I wanted to further isolate duration vs fetch time, so I trimmed the table to exactly 10M rows, and then compared a large select with an aggregate version of the same query using a "cheap" aggregate function.

select col1, col2, col3 
from test_table;

Duration: 0.00082 sec, Fetch Time: 4.729 sec

select count(col1), count(col2), count(col3) 
from test_table;

Duration: 2.692 sec, Fetch Time: 0.000011 sec

I think the duration/fetch time metrics are a little confusing, since I assume the first query duration only includes time to identify row IDs (not buffer them), while the second has to get each row in memory, a step that overlaps with fetch behavior in the first query, even though it's included in duration in the second.

Assuming 25% overhead for count() that's about 2 secs to walk through 10M rows, so 5M rows or 60 MB/sec. Assuming the same access time cost for the first query, that would mean an additional ~45 MB/sec to copy them into a buffer to fetch.

In any case, even assuming additional data overhead per row, it seems like at a minimum simply accessing rows in memory in InnoDB is a significant bottleneck, regardless of the driver or network. mysqld CPU% maxes out one thread for the duration of the aggregate query, so it appears to be a CPU-bound operation. Does that sound right? Is this just the cost of doing business with B+ trees? For (an unfair) comparison, the same operation takes about 200ms in Python using Pandas.

Additional info

>>show create table test_table;
...
CREATE TABLE `test_table ` (
  `col1` int(11) NOT NULL,
  `col2 ` int(11) NOT NULL,
  `col3 ` int(11) NOT NULL,
  PRIMARY KEY (`col1`,`col3`,`col2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 
...

>ulimit
unlimited
>ulimit -n
256
>ulimit -u
2784
>iostat
              disk0               disk2       cpu    load average
    KB/t  tps  MB/s     KB/t  tps  MB/s  us sy id   1m   5m   15m
   24.24   31  0.73    12.61    0  0.00   4  3 93  2.46 2.21 2.42
>top
Processes: 541 total, 2 running, 3 stuck, 536 sleeping, 2564 threads                                                                                                              11:40:37
Load Avg: 1.71, 1.53, 1.53  CPU usage: 6.94% user, 6.94% sys, 86.11% idle  SharedLibs: 364M resident, 61M data, 16M linkedit.
MemRegions: 246424 total, 6527M resident, 153M private, 2132M shared. PhysMem: 16G used (3318M wired), 86M unused.
VM: 7190G vsize, 1995M framework vsize, 26292674(64) swapins, 27667013(0) swapouts. Networks: packets: 137115709/125G in, 167115774/85G out.
Disks: 13216718/253G read, 8333988/245G written.

Best Answer

Some more things to try:

SHOW GLOBAL STATUS;   -- and capture somewhere
SELECT ....;
SHOW GLOBAL STATUS;   -- and capture somewhere else

Then subtract the Handler_% values and the InnoDB_% values. In a similar test (of only 3.1M rows), I got these:

Handler_read_next      3.1M
Innodb_rows_read       3.1M
Innodb_buffer_pool_bytes_data  53M
Innodb_data_read               53M  -- These matched the "Using index" it used

A second run had a zero difference instead of 53M. This because the first run had to read everything from disk; the second found it all in the buffer_pool

(I suggest AVG(col) if the col is numeric; this makes it clear that the aggregate had to read every row. First I tried MAX(col); it simply went to the end of the index, so virtually 0 time and effort and data read.)

Semantically, COUNT(col) checks each item for being NOT NULL. However your cols were part of the PRIMARY KEY, which is, by definition, composed of not-NULL columns. So, I am a bit suspicious of the effort taken with that aggregate.

Back to your main question. Why does reading a row take so long?

Assuming it is walking through a B+Tree and it is sitting at the 'next' row.
Check for transaction locks, history list, etc. (There could be multiple copies of the row, some waiting for COMMIT/ROLLBACK.)
Pick apart the record. (You asked for 3 columns.)
Move on to the next block (when appropriate)
Perform any the expression (COUNT(col1))
Convert from internal format to external (SELECT col1)
Handoff the row to something else, which will buffer it for transmission
Deal with localhost or TCP/IP.

Note: each of those is fast, but there are a lot of details. Also, it is single-threaded. Well, not totally -- fetching the next block from disk (if needed) may be performed by a separate thread.

Bringing a block from disk:

Issue the read
Lock the buffer_pool (buffer_pool_instances helps a little here)
Get an empty block (or wait for flushing a block to disk)
Finish the read
Update various flags, hashes, etc
Unlock the buffer_pool

Again, this mostly single-threaded.

That brings me to other points:

Reading a million rows from disk to a program is not normal.
Normally one tries to get SQL to do more of the work (eg aggregates).
It is possible (but clumsily) to have multiple connections, each reading part of the data and processing it. (Probably should not have more threads than CPU cores.) 8 threads might run only 4 times as fast as a single thread -- due to extra overhead and contention.

YOUR QUERY

SELECT post.postid, post.attach FROM newbb_innopost AS post WHERE post.threadid = 51506;

At first glance, that query should only touches 1.1597% (62510 out of 5390146) of the table. It should be fast given the key distribution of threadid 51506.

REALITY CHECK

No matter which version of MySQL (Oracle, Percona, MariaDB) you use, none of them can fight to one enemy they all have in common : The InnoDB Architecture.

InnoDB Architecture

CLUSTERED INDEX

Please keep in mind that the each threadid entry has a primary key attached. This means that when you read from the index, it must do a primary key lookup within the ClusteredIndex (internally named gen_clust_index). In the ClusteredIndex, each InnoDB page contains both data and PRIMARY KEY index info. See my post Best of MyISAM and InnoDB for more info.

REDUNDANT INDEXES

You have a lot of clutter in the table because some indexes have the same leading columns. MySQL and InnoDB has to navigate through the index clutter to get to needed BTREE nodes. You should reduced that clutter by running the following:

ALTER TABLE newbb_innopost
    DROP INDEX threadid,
    DROP INDEX threadid_2,
    DROP INDEX threadid_visible_dateline,
    ADD INDEX threadid_visible_dateline_index (`threadid`,`visible`,`dateline`,`userid`)
;

Why strip down these indexes ?

The first three indexes start with threadid
threadid_2 and threadid_visible_dateline start with the same three columns
threadid_visible_dateline does not need postid since it's the PRIMARY KEY and it's embedded

BUFFER CACHING

The InnoDB Buffer Pool caches data and index pages. MyISAM only caches index pages.

Just in this area alone, MyISAM does not waste time caching data. That's because it's not designed to cache data. InnoDB caches every data page and index page (and its grandmother) it touches. If your InnoDB Buffer Pool is too small, you could be caching pages, invalidating pages, and removing pages all in one query.

TABLE LAYOUT

You could shave of some space from the row by considering importthreadid and importpostid. You have them as BIGINTs. They take up 16 bytes in the ClusteredIndex per row.

You should run this

SELECT importthreadid,importpostid FROM newbb_innopost PROCEDURE ANALYSE();

This will recommend what data types these columns should be for the given dataset.

CONCLUSION

MyISAM has a lot less to contend with than InnoDB, especially in the area of caching.

While you revealed the amount of RAM (32GB) and the version of MySQL (Server version: 10.0.12-MariaDB-1~trusty-wsrep-log mariadb.org binary distribution, wsrep_25.10.r4002), there are still other pieces to this puzzle you have not revealed

The InnoDB settings
The Number of Cores
Other settings from my.cnf

If you can add these things to the question, I can further elaborate.

UPDATE 2014-08-28 11:27 EDT

You should increase threading

innodb_read_io_threads = 64
innodb_write_io_threads = 16
innodb_log_buffer_size = 256M

I would consider disabling the query cache (See my recent post Why query_cache_type is disabled by default start from MySQL 5.6?)

query_cache_size = 0

I would preserve the Buffer Pool

innodb_buffer_pool_dump_at_shutdown=1
innodb_buffer_pool_load_at_startup=1

Increase purge threads (if you do DML on multiple tables)

innodb_purge_threads = 4

Best Answer

Related Solutions

Mysql – find and insert row to another table using thesql trigger

Mysql – Why are simple SELECTs on InnoDB 100x slower than on MyISAM