I am struggling with query performance. I'm working with historical records of who has voted (or not) in an election. I have data for nearly 40 million ballots over 22 election. When I get additional data in the future, the oldest elections will drop off of the new data set, but I want to retain that information in my own database. So I need a system that will let me merge new data into the existing data.
I have a table 'elections' with 22 rows:
CREATE TABLE `elections` (
`election_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`date` date NOT NULL,
`election_type` enum('primary','general','special','board') NOT NULL,
`whose` enum('state','other') NOT NULL,
PRIMARY KEY (`election_id`),
UNIQUE KEY `date` (`date`,`whose`),
KEY `whose` (`whose`)
) ENGINE=InnoDB AUTO_INCREMENT=32 DEFAULT CHARSET=latin1;
And a table 'ballots' which is initially empty:
CREATE TABLE `ballots` (
`voter_id` int(10) unsigned NOT NULL,
`election_id` int(10) unsigned NOT NULL,
`status` enum('unreturned','invalid','valid') NOT NULL,
PRIMARY KEY (`voter_id`,`election_id`),
KEY `election_id` (`election_id`),
KEY `status` (`status`),
CONSTRAINT `ballots_ibfk_1` FOREIGN KEY (`election_id`) REFERENCES `elections` (`election_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
My raw data obviously does not contain an election_id. I need to look up the election_id based on the date. I load my raw data initially into a 'temp_ballots' table which has ~40M rows:
CREATE TABLE `temp_ballots` (
`voter_id` int(10) unsigned NOT NULL,
`date` date NOT NULL,
`status` enum('unreturned','invalid','valid') NOT NULL,
KEY `date` (`date`),
KEY `voter_id` (`voter_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
And now, the part I struggle with. I want to take the data from temp_ballots, look up the correct election_id, and insert it into ballots. (Then I will drop temp_ballots.) The following operation is taking hours, and I don't know when it might finish, and I don't understand why this is slow. I think I've indexed the appropriate columns. What am I doing wrong?
INSERT INTO `ballots` (voter_id, election_id, status)
SELECT voter_id, election_id, status
FROM elections NATURAL JOIN temp_ballots
WHERE whose = 'state'
ON DUPLICATE KEY UPDATE ballots.status = temp_ballots.status;
I am using MariaDB 10.0.36, so I do not have the ANALYZE statement. But here is the output from EXPLAIN:
+------+-------------+--------------+------+---------------+-------+---------+-----------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------+------+---------------+-------+---------+-----------------------+------+-----------------------+
| 1 | SIMPLE | elections | ref | date,whose | whose | 1 | const | 22 | Using index condition |
| 1 | SIMPLE | temp_ballots | ref | date | date | 3 | voters.elections.date | 1123 | |
+------+-------------+--------------+------+---------------+-------+---------+-----------------------+------+-----------------------+
Best Answer
Instead of depending on IODKU to update rows, do the messy work in the
SELECT
:Or...
Notes:
MAX
to choose among thestatus
values, since you did not have any way of picking among them.