Here is my situation:
I got over 10G of web logs (around 30 million rows) and I mainly search it with various conditions.
e.g.:
select * from tbl
where ip = '123.123.12.3'
and agent = 'mozillar'
and body like '%script%'
As data is getting bigger, a simple query takes minutes, so I am trying to create some distributed database with MySQL cluster.
I have built 1 management server, 1 SQL (api) server and 4 data nodes on a Google cloud VM. They have 2 CPUs and 8 GB per each instance.
PROBLEM:
When I try to restore data from an .sql file, each insert statement takes almost 2~3 seconds!
3 sec * 30m row = FOREVER!
The select query also disappointed me, it takes almost 100 times longer than normal InnoDB.
Am I doing something wrong?
Is there any better solution for my purpose?
config.ini
[ndbd default]
NoOfReplicas=2
DataMemory=5G
IndexMemory=128M
MaxNoOfConcurrentOperations=1000000
MaxNoOfLocalOperations=1100000
MaxNoOfConcurrentTransactions=327680
NoOfFragmentLogFiles=256
FragmentLogFileSize=32M
SharedGlobalMemory=512M
DiskPageBufferMemory=512M
DiskIoThreadPool=8
[ndb_mgmd]
NodeId=1
hostname=10.142.0.7
datadir=/var/lib/mysql-cluster/mgmt
[ndbd]
NodeId=11
hostname=10.142.0.8
datadir=/var/lib/mysql-cluster/data1
[ndbd]
NodeId=12
hostname=10.142.0.9
datadir=/var/lib/mysql-cluster/data2
[ndbd]
NodeId=13
hostname=10.142.0.10
datadir=/var/lib/mysql-cluster/data3
[ndbd]
NodeId=14
hostname=10.142.0.11
datadir=/var/lib/mysql-cluster/data4
[mysqld]
NodeId=100
All the nodes are connected to management server as expected
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 4 node(s)
id=11 @10.142.0.8 (mysql-5.6.29 ndb-7.4.11, Nodegroup: 0, *)
id=12 @10.142.0.9 (mysql-5.6.29 ndb-7.4.11, Nodegroup: 0)
id=13 @10.142.0.10 (mysql-5.6.29 ndb-7.4.11, Nodegroup: 1)
id=14 @10.142.0.11 (mysql-5.6.29 ndb-7.4.11, Nodegroup: 1)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @10.142.0.7 (mysql-5.6.29 ndb-7.4.11)
[mysqld(API)] 1 node(s)
id=100 @10.142.0.2 (mysql-5.6.29 ndb-7.4.11)
Additional info:
Create Table: CREATE TABLE `tbl` (
`no` int(11) NOT NULL AUTO_INCREMENT,
`rule_name` varchar(50) DEFAULT NULL,
`ip` varchar(20) DEFAULT NULL,
`detection_time` datetime NOT NULL,
`uri` text,
`site` varchar(500) DEFAULT NULL,
`country` varchar(50) DEFAULT NULL,
`body` text CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`detection_string` text,
`detection_type` int(11) NOT NULL,
`action` int(11) NOT NULL,
`category` int(10) NOT NULL,
PRIMARY KEY (`no`),
KEY `no` (`no`),
KEY `ip` (`ip`)
) ENGINE=InnoDB AUTO_INCREMENT=21682043 DEFAULT CHARSET=utf8
1 row in set (0.03 sec)
Best Answer
Redundant index. A
PRIMARY KEY
is an index, so there is not need for the redundantKEY no(no)
. (DROPping
this will speed up things a little.)Agent. Where's the
agent
in the schema? Is it a varchar? or enum? or normalizedSMALLINT UNSIGNED
. The last would save space, hence improve speed, but would complicate the insert.Better index. Replace
INDEX(ip)
withINDEX(ip, agent)
. This will speed up the givenSELECT
some. (But I suspect the agent is usually the same for a given ip.)Country code There are very standard 2-letter country codes; use them instead of the bulky
varchar(50)
.Batching. For now for the real speedup... Batch the inserts. Collect them in a table, swap tables,
INSERT..SELECT
, etc. Details here. That technique is self-adjusting; so if things get slower, they actions also get more efficient. That link also explains how to efficiently normalize.Tuning.
IndexMemory=128M
seems unreasonably small.Back to InnoDB?
The techniques I have mentioned are well tuned for InnoDB; I have less experience with NDB Cluster. Maybe you should go back to InnoDB and we can work on making that better?
innodb_buffer_pool_size
should be about 70% of RAM.IPv6. A side note: IPv4 needs only
VARCHAR(15)
; IPv6 needs(39)
. And ip addresses may as well beCHARACTER SET ascii
, not utf8. (No significant performance benefit here.)