Mysql – Data usage increases initially when reorganize partition thesql cluster

clusteringMySQLmysql-5.5mysql-clusterndbcluster

In my MySQL cluster, the data usage of 4 data nodes is approximately 77%. I added a new data node online, and attempted to reorganize partition to redistribute data among all data nodes.

But before the query alter table customer_tbl reorganize partition; is successfully executed, the data usage of existing data nodes started increasing, and reached up to 95% before ERROR customer_tbl is FULL is encountered.

How can I bring a data node online and reorganize table successfully? My table has ~100 million rows.

Why the data usage of existing data nodes increased in the first place and what is the internal working of mysql cluster when it reorganizes partitions?

Best Answer

You're correct that the on-line repartitioning of data temporarily uses extra memory on the data nodes. In most cases the application data is split over many tables and so relatively little memory is needed when repartitioning each table. If the data is in one big table and you don't have the available RAM then there are a number of options...

Perform the data migration yourself in an iterative fashion. e.g.

create cdr2 (....) partition by ...;

loop { insert into cdr2 select * from cdr order by .... LIMIT 1000; delete from cdr order by .... LIMIT 1000; } until cdr empty.

Some extra memory will be used, but shouldn't be double. Best effect would be if the order by .... could follow insert-order in some way.
Backup the data and restore it on another Cluster (with more memory), repartition and then revers the process.
Use mysqldump, empty the table, repartition it and then load the data back in
Convert the table to be disk-based (on-line operation), repartition and then convert it back into an in-memory table
(Temporarily) add extra RAM to the data nodes (this is an on-line operation).

Regards, Andrew.

Related Solutions

MySQL Node Not Connecting to Management Nodes in NDB Cluster

I would remove the [mysql_cluster] section in the my.cnf file entirely, as it's redundant and can only lead to confusion (you already have two different settings for ndb-connectstring there). So I would go from this config block:

ndb-connectstring=dbc1.enhancier.net,dbc2.enhancier.net

[mysql_cluster]

ndb-connectstring=dbc1.enhancier.net,dbc.enhancier.net

To this:

ndb-connectstring="10.80.45.5:1186,10.80.45.7:1186"

That should eliminate a number of potential causes by specifying the same IP addresses that the management server lists, explicitly specifying the port, and wrapping the string in quotes.

If things still don't work, then at least we've eliminated a few basic things. The next thing to double check is to ensure that the mysqld processes can communicate with the data nodes over the network, given the specified IP address and port combinations.

But now that I look at the ndb_mgm SHOW output again, and what you even noted yourself, the data nodes are not running! When the mysqld (SQL API) node starts up, it contacts the management server (ndb_mgmd) in order to discover the data nodes (ndbd/ndbmtd processes). It then attempts to contact the data nodes in order to retrieve the data dictionary (pulls in .frm blobs for the tables) and other object and cluster metadata. That's what is timing out after 30 seconds:

[Warning] NDB : Tables not available after 30 seconds.  Consider increasing --ndb-wait-setup value

Of course it's not able to pull in the table metadata because the data nodes are not responding as they're not running. :) The management nodes only serve as a discovery mechanism in this sequence, they don't hold any data.

FWIW, the docs aren't very good on this topic, so I don't blame you for getting confused here.

P.S. If you just want to get a running and working system, I would also recommend trying the Auto Installer.

MySQL Cluster – How to Resolve NDB Engine Connection Issues

OK, so I managed to resolve the problem.

Since it's an Oracle Linux 7.3 there is a so called Security-Enhanced Linux (SELinux) option. I set this to "disabled" and restarted the node:

[root@mysqld-mgmt-1 ~]# vi /etc/selinux/config

SELINUX=disabled

[root@mysqld-mgmt-1 ~]# shutdown -r now

After the node restart, I started the services again and used the ndb_mgm to list the nodes:

ndb_mgm> show
Connected to Management Server at: mysqld_mgmt_1:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @172.16.1.103  (mysql-5.6.37 ndb-7.3.18, Nodegroup: 0, *)
id=3    @172.16.1.104  (mysql-5.6.37 ndb-7.3.18, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @172.16.1.101  (mysql-5.6.37 ndb-7.3.18)

[mysqld(API)]   1 node(s)
id=4    @172.16.1.101  (mysql-5.6.37 ndb-7.3.18)