I have a single table containing around 280 million rows in MariaDB 10.4.10.
I need to process this full table in an external program, sorted by timestamp. The external program is fast; the largest factor is the query speed. This query takes around 37 minutes to run (old laptop, SSD, virtual machine with 8GByte RAM). This is normal, I think.
However, the first result is only returned after 17 minutes, in which the client program is just waiting.
Is there a way to make MySQL return the first results faster, even if the total query time is unchanged, or even slightly longer?
Table structure:
CREATE TABLE `eventlog` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`moment` datetime NOT NULL,
`source_id` int(11) NOT NULL,
`code` binary(5) NOT NULL,
`category` binary(2) DEFAULT NULL,
`type` enum('single','double','triple') CHARACTER SET binary NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_moment` (`moment`)
) ENGINE=InnoDB AUTO_INCREMENT=283252852 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
It is around 12GByte on disk (data length, excluding indexes). The index is 6.5GByte.
Query:
SELECT
id,
moment,
source_id,
code
FROM
eventlog
ORDER BY
moment ASC
EXPLAIN
shows it is not using the index on moment
to run the query.
MariaDB [test]> explain select id, moment, source_id, code from eventlog order by moment asc\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: eventlog
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 282499875
Extra: Using filesort
1 row in set (0.017 sec)
Profiling the query shows it spends practically all time on "Creating sort index", even though an index on the ORDER BY
field already exists. (All other stages in the profile are each less than 0.02s):
*************************** 14. row ***************************
Status: Creating sort index
Duration: 999.999999
CPU_user: 999.999999
CPU_system: 110.469717
Context_voluntary: 307668
Context_involuntary: 18439
Block_ops_in: 74089616
Block_ops_out: 50343784
Messages_sent: 0
Messages_received: 0
Page_faults_major: 111
Page_faults_minor: 6653
Swaps: 0
Source_function: <unknown>
Source_file: sql_select.cc
Source_line: 21167
I have tried:
- Forcing the index: I thought this would make sense, since the query is basically returning the rows in index order.
EXPLAIN
then shows the index will be used, but the first result is even later (175 minutes?!). - Increasing sort_buffer_size from 2M to 20M, without effect.
- Running with
LIMIT
andOFFSET
: the first batch of 500 000 rows is returned in 9 seconds, which is nice, but the total query time (getting all 280 million rows in batches of 500 000) increases drastically as theOFFSET
increases, making the total time the external program is idle even longer.
How can I make make MySQL return the first results faster without making the total runtime several times longer?
Best Answer
First, I have to ask what the client will do with 280M rows? If it is a web page, that much data will crash the user's machine.
Here are two things to speed up getting data from that query:
Change the indexes to these:
What you have now will fetch all the data, then sort it by
moment
, before delivering even the first row. With the indexing change, the sort is gone, and the rest can be 'streamed' directly. But will the client take advantage of that? Well...Drawbacks of "chunking":
Consider this alternative:
SELECT ... INTO OUTFILE ...
and then process the resulting file as you might process a CSV file.