Mysql – Data usage increases initially when reorganize partition thesql cluster

clusteringMySQLmysql-5.5mysql-clusterndbcluster

In my MySQL cluster, the data usage of 4 data nodes is approximately 77%. I added a new data node online, and attempted to reorganize partition to redistribute data among all data nodes.

But before the query alter table customer_tbl reorganize partition; is successfully executed, the data usage of existing data nodes started increasing, and reached up to 95% before ERROR customer_tbl is FULL is encountered.

How can I bring a data node online and reorganize table successfully? My table has ~100 million rows.

Why the data usage of existing data nodes increased in the first place and what is the internal working of mysql cluster when it reorganizes partitions?

Best Answer

You're correct that the on-line repartitioning of data temporarily uses extra memory on the data nodes. In most cases the application data is split over many tables and so relatively little memory is needed when repartitioning each table. If the data is in one big table and you don't have the available RAM then there are a number of options...

  1. Perform the data migration yourself in an iterative fashion. e.g.

    create cdr2 (....) partition by ...;

    loop { insert into cdr2 select * from cdr order by .... LIMIT 1000; delete from cdr order by .... LIMIT 1000; } until cdr empty.

    Some extra memory will be used, but shouldn't be double. Best effect would be if the order by .... could follow insert-order in some way.

  2. Backup the data and restore it on another Cluster (with more memory), repartition and then revers the process.

  3. Use mysqldump, empty the table, repartition it and then load the data back in

  4. Convert the table to be disk-based (on-line operation), repartition and then convert it back into an in-memory table

  5. (Temporarily) add extra RAM to the data nodes (this is an on-line operation).

Regards, Andrew.