MySQL NDB Cluster – Why All Queries Are Slow

MySQLndbcluster

Here is my situation:

I got over 10G of web logs (around 30 million rows) and I mainly search it with various conditions.

e.g.:

select * from tbl 
where ip = '123.123.12.3' 
  and agent = 'mozillar' 
  and body like '%script%'

As data is getting bigger, a simple query takes minutes, so I am trying to create some distributed database with MySQL cluster.

I have built 1 management server, 1 SQL (api) server and 4 data nodes on a Google cloud VM. They have 2 CPUs and 8 GB per each instance.

PROBLEM:

When I try to restore data from an .sql file, each insert statement takes almost 2~3 seconds!

3 sec * 30m row = FOREVER!

The select query also disappointed me, it takes almost 100 times longer than normal InnoDB.

Am I doing something wrong?

Is there any better solution for my purpose?

config.ini

[ndbd default]
NoOfReplicas=2    
DataMemory=5G    
IndexMemory=128M   
MaxNoOfConcurrentOperations=1000000
MaxNoOfLocalOperations=1100000
MaxNoOfConcurrentTransactions=327680
NoOfFragmentLogFiles=256
FragmentLogFileSize=32M
SharedGlobalMemory=512M
DiskPageBufferMemory=512M
DiskIoThreadPool=8

[ndb_mgmd]
NodeId=1
hostname=10.142.0.7
datadir=/var/lib/mysql-cluster/mgmt


[ndbd]
NodeId=11
hostname=10.142.0.8
datadir=/var/lib/mysql-cluster/data1
[ndbd]
NodeId=12
hostname=10.142.0.9
datadir=/var/lib/mysql-cluster/data2
[ndbd]
NodeId=13
hostname=10.142.0.10
datadir=/var/lib/mysql-cluster/data3
[ndbd]
NodeId=14
hostname=10.142.0.11
datadir=/var/lib/mysql-cluster/data4

[mysqld]
NodeId=100

All the nodes are connected to management server as expected

-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=11   @10.142.0.8  (mysql-5.6.29 ndb-7.4.11, Nodegroup: 0, *)
id=12   @10.142.0.9  (mysql-5.6.29 ndb-7.4.11, Nodegroup: 0)
id=13   @10.142.0.10  (mysql-5.6.29 ndb-7.4.11, Nodegroup: 1)
id=14   @10.142.0.11  (mysql-5.6.29 ndb-7.4.11, Nodegroup: 1)
[ndb_mgmd(MGM)] 1 node(s)
id=1    @10.142.0.7  (mysql-5.6.29 ndb-7.4.11)
[mysqld(API)]   1 node(s)
id=100  @10.142.0.2  (mysql-5.6.29 ndb-7.4.11)

Additional info:

    Create Table: CREATE TABLE `tbl` (
  `no` int(11) NOT NULL AUTO_INCREMENT,
  `rule_name` varchar(50) DEFAULT NULL,
  `ip` varchar(20) DEFAULT NULL,
  `detection_time` datetime NOT NULL,
  `uri` text,
  `site` varchar(500) DEFAULT NULL,
  `country` varchar(50) DEFAULT NULL,
  `body` text CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  `detection_string` text,
  `detection_type` int(11) NOT NULL,
  `action` int(11) NOT NULL,
  `category` int(10) NOT NULL,
  PRIMARY KEY (`no`),
  KEY `no` (`no`),
  KEY `ip` (`ip`)
) ENGINE=InnoDB AUTO_INCREMENT=21682043 DEFAULT CHARSET=utf8
1 row in set (0.03 sec)

Best Answer

Redundant index. A PRIMARY KEY is an index, so there is not need for the redundant KEY no(no). (DROPping this will speed up things a little.)

Agent. Where's the agent in the schema? Is it a varchar? or enum? or normalized SMALLINT UNSIGNED. The last would save space, hence improve speed, but would complicate the insert.

Better index. Replace INDEX(ip) with INDEX(ip, agent). This will speed up the given SELECT some. (But I suspect the agent is usually the same for a given ip.)

Country code There are very standard 2-letter country codes; use them instead of the bulky varchar(50).

Batching. For now for the real speedup... Batch the inserts. Collect them in a table, swap tables, INSERT..SELECT, etc. Details here. That technique is self-adjusting; so if things get slower, they actions also get more efficient. That link also explains how to efficiently normalize.

Tuning. IndexMemory=128M seems unreasonably small.

Back to InnoDB?

The select query also disappointed me, it takes almost 100 times longer than normal InnoDB.

The techniques I have mentioned are well tuned for InnoDB; I have less experience with NDB Cluster. Maybe you should go back to InnoDB and we can work on making that better? innodb_buffer_pool_size should be about 70% of RAM.

IPv6. A side note: IPv4 needs only VARCHAR(15); IPv6 needs (39). And ip addresses may as well be CHARACTER SET ascii, not utf8. (No significant performance benefit here.)