I have a large table with about 100 million rows.
I have done indexing and partitioning in the table, but still queries sometimes take >100 or even >200 seconds to run.
So I was thinking, what is the next step after indexing and partitioning to improvise the MySQL performance.
- Changing the code logic is one option (but that is reducing the data, I am concerned about how to tune MySQL to work with this data only)
- Upgrading the hardware and MySQL versions in another.(I am already using SSD, 8 core CPU and 32GBs RAM, and resources are available most of the time. )
And this question is not only about this table, but about general MySQL practices. Like doing indexing is one of the basic things, and after indexing second thing is partitioning. But what after that? My question can be little vague, but I think this will be helpful to many people searching for similar ansers.
To give you an idea, here is an overview of my table large_table
:
+------+------------+---------------+---------------------+
| user | mobile | is_first_time | time_send |
+------+------------+---------------+---------------------+
| a | xxxxxxxxxx | 0 | 2018-03-12 00:00:00 |
+------+------------+---------------+---------------------+
| b | xxxxxxxxxx | 1 | 2018-04-02 07:08:09 |
+------+------------+---------------+---------------------+
| c | xxxxxxxxxx | 0 | 2018-01-03 01:02:03 |
+------+------------+---------------+---------------------+
Following is the output of SHOW INDEXES FOR large_table
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| large_table | 0 | PRIMARY | 1 | user | A | 493 | NULL | NULL | | BTREE | | |
| large_table | 0 | PRIMARY | 2 | mobile | A | 105682194 | NULL | NULL | | BTREE | | |
| large_table | 1 | userid | 1 | user | A | 188718 | NULL | NULL | | BTREE | | |
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The four type of queries that generally run on this table :
INSERT IGNORE INTO large_table (user,mobile) VALUES ('1234','9876543210')
UPDATE large_table SET is_first_time = 0 WHERE (user,mobile) IN(('xxxx','XXXXXXXXXX'),('zzzz','ZZZZZZZZZZ'),('yyyy','YYYYYYYYYY'),('aaaa','AAAAAAAAAA')
SELECT mobile FROM large_table WHERE user = 168807
SELECT count(mobile) FROM large_table WHERE user =xxxx
Thanks in advance
(this is more of a knowledge-seeking question rather than problem's solution)
DDL of table: https://gist.github.com/kadambkaluskar/4e2ff8d5be6f625c123b58bc10cb32b0
Edit 1: Couldn't run EXPLAIN on UPDATE query, as I am using MySQL 5.5
Edit 2:
Whenever there is a query like UPDATE large_table SET is_first_time = 0 WHERE (`user`,`mobile`) IN
, it goes into updating
state, and all other queries start to line-up after that.
Following is the output of SHOW PROCESSLIST
+------------+---------+------------------+--------+---------+------+------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+------------+----------------------+------------------+-----------------+---------+------+------------------+--------------------------------------------------------------------------+
| 1307948736 | my_user | 10.0.0.48:29134 | my_db | Query | 91 | Updating | UPDATE large_table SET is_first_time = 0 WHERE (`user`,`mobile`) IN(('21xx67','919xxxxxx002'),('2177 |
| 1308045382 | my_user | 10.0.0.62:42912 | my_db | Query | 50 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('9xx75','917xxxxxx805') |
| 1308064919 | my_user | 10.0.0.48:56362 | my_db | Query | 42 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('105xx7','9189xxxxxx59') |
| 1308066190 | my_user | 10.0.0.21:63342 | my_db | Query | 41 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('91xx0','9196xxxxxx60') |
| 1308069898 | my_user | 10.0.0.62:48648 | my_db | Query | 39 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('95xx5','9192xxxxxx14') |
| 1308073279 | my_user | 10.0.0.88:30996 | my_db | Query | 38 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('57xx4','9170xxxxxx43') |
| 1308073424 | my_user | 10.0.0.24:4738 | my_db | Query | 38 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('91xx0','919xxxxxx494') |
| 1308073776 | my_user | 10.0.0.88:31118 | my_db | Query | 38 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('11xxx3','919xxxxxx224') |
| 1308076906 | my_user | 10.0.0.72:26482 | my_db | Query | 37 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('1xx17','9197xxxxxx62') |
| 1308077527 | my_user | 10.0.0.9:28928 | my_db | Query | 37 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('76xx0','9198xxxxxx66') |
| 1308082112 | my_user | 10.0.0.113:4230 | my_db | Query | 34 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('10xxx8','9197xxxxxx60') |
| 1308083588 | my_user | 10.0.0.72:27516 | my_db | Query | 34 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('11xxx4','919xxxxxx557') |
| 1308088011 | my_user | 10.0.0.101:60448 | my_db | Query | 32 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('12xxx0','9184xxxxxx48') |
| 1308090557 | my_user | 10.0.0.9:32118 | my_db | Query | 31 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('11xxx0','9193xxxxxx59') |
| 1308093171 | my_user | 10.0.0.72:29535 | my_db | Query | 30 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('11xx84','9193xxxxxx07') |
| 1308095347 | my_user | 10.0.0.113:5454 | my_db | Query | 29 | Updating | UPDATE large_table SET is_first_time = 0 WHERE `user`=12xxx1 AND `mobile` = '917xxxxxx482' |
| 1308096238 | my_user | 10.0.0.62:56668 | my_db | Query | 29 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('10xxx8','9190xxxxxx03') |
| 1308096342 | my_user | 10.0.0.78:16782 | my_db | Query | 29 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('1xxx6','9193xxxxxx54') |
| 1308106957 | my_user | 10.0.0.100:17930 | my_db | Query | 25 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('7xxx8','9189xxxxxx33') |
| 1308109743 | my_user | 10.0.0.107:29496 | my_db | Query | 23 | update | INSERT IGNORE INTO large_table(user,mobile) VALUES ('1xxx29','9197xxxxxx75') |
+------------+---------+------------------+-----------------+---------+------+------------------+---------------------------------------------------------------------------------------+
Userwise mobile distribution :
+-------------+----------------+
| no.of.users | mobile.no.count|
+-------------+----------------+
| 1 | 1.5milion |
| 2 | 1 milion |
| 10 | 500-1000k |
| 147 | 100k-500k |
| 173 | 50k-100k |
| 1336 | 10k-50k |
| 2610 | 5k-10k |
| 6500 | 1k-5k |
| 12k | 100-1k |
| 12k | 100-1k |
| 10k | 10-100 |
| 23k | 1-10 |
+-------------+----------------+
Best Answer
PARTITIONing
buys you no performance. Get rid of it. (Also, 400 partitions is so many that that, itself, slows things down. More discussion.)You already have this; keep it:
PRIMARY KEY(user, mobile)
UPDATE
. Although "row constructors" as inWHERE(a,b) IN ((1,2),...)
is tempting syntax, it is awful at performance. The Optimizer does not (until version 5.7.3) do anything useful with that. It scans the entire table! (Note how thek firstUPDATE
in thePROCESSLIST
is at 91 seconds, and seems to be blocking all the other queries.)One way to 'fix' the update is to build a table (
tmp
) with theIN
items, then do a multi-tableUPDATE
withand have
PRIMARY KEY(user, mobile)
ontmp
.The other solution to the Update is to upgrade to 5.6, then 5.7. Anyway, 5.5 will soon be End-Of-Life'd.
To save a little space, and a little speed, consider
DECIMAL(11)
formobile
. It would take 5 bytes instead ofBIGINT
, which takes 8.Get rid of the redundant
KEY(user)
. A rule: If you haveINDEX(a,b)
, you don't needINDEX(a)
. AndPRIMARY KEY
is Unique and a key`.Don't say
count(mobile)
, simply sayCOUNT(*)
. When you specify an expression, it checks that for beingNULL
, which is not relevant in your case. AlthoughSELECT COUNT(*) FROM lt WHERE user = xx
might useINDEX(key)
, the overhead of that otherwise-unneeded index makes it probably not worth having.Be cautious about
MAX(user)
. 100M is not terribly far from 2 billion. Note, making itINT UNSIGNED
would raise the limit to 4 billion, while still taking only 4 bytes.I hope I have provided you some 'knowledge' as well as multiple solutions toward improving the table's performance.
After all these suggestions, I suspect you will rarely see a
Time
greater than "1" inPROCESSLIST
.