MySQL Cluster – How the Sharding works

MySQLmysql-clustersharding

Right now I have a MySQL database with a couple of tables with around 20MM rows each; This DB is located on a single server ( 4gb RAM, 60GB storage, 2 CPUs) each table size is 4gb +-, but my server storage is about 75%, and I'll need to upgrade the server.

Instead of growing vertically I want to grow horizontally, after reading about sharding, partitions and clusters i try the cluster solution, using MySQL cluster server.

I quote the MySQL Cluster docs

MySQL Cluster automatically shards (partitions) tables across nodes,
enabling databases to scale horizontally on low cost, commodity
hardware to serve read and write-intensive workloads, accessed both
from SQL and directly via NoSQL APIs.

My current setup is:

server-db0 - management 
server-db1 - SQL node
server-db2 - SQL node
server-db3 - Data node
server-db4 - Data node

Each server 2gb ram 2CPUs 40GB storage

My cluster config file is:

[ndbd default]
NoOfReplicas=2
DataMemory=750M
IndexMemory=200M

[tcp default]
portnumber=2202

[ndb_mgmd]
hostname=10.132.36.121
datadir=/var/lib/mysql-cluster

[ndbd]
hostname=10.132.36.251
datadir=/usr/local/mysql/data

[ndbd]
hostname=10.132.36.104
datadir=/usr/local/mysql/data

[mysqld]
hostname=10.132.36.112

[mysqld]
hostname=10.132.36.117

All is working as expected

-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2    @10.132.36.251  (mysql-5.6.28 ndb-7.4.10, Nodegroup: 0, *)
id=3    @10.132.36.104  (mysql-5.6.28 ndb-7.4.10, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @10.132.36.121  (mysql-5.6.28 ndb-7.4.10)

[mysqld(API)]   2 node(s)
id=4    @10.132.36.112  (mysql-5.6.28 ndb-7.4.10)
id=5    @10.132.36.117  (mysql-5.6.28 ndb-7.4.10)

Memory Usage

ndb_mgm> all report memory
Node 2: Data usage is 0%(68 32K pages of total 24000)
Node 2: Index usage is 0%(81 8K pages of total 25632)
Node 3: Data usage is 0%(68 32K pages of total 24000)
Node 3: Index usage is 0%(81 8K pages of total 25632)

In my tests, I saw that all the data is replicated on each SQL node I inserted 750K rows and the Data usage, and Index usage are the same on both servers.

My main question is

If my big table size is 4GB, I will need to upgrade my node servers to a bigger server with more RAM? For what I see this is not a horizontal growing, I'll still need bigger servers instead of more small servers.

I don't see the how this shards works, I missing something?

Would be great if you point me to the right path.

Thanks in advance.

Best Answer

NDB may be the right solution for you. Or...

Sharding is splitting up the data -- some rows go here; some go there. NDB does sharding, and it also does redundancy. That is, the rows that are going 'here' can be put on 2 servers for redundancy. So think through whether you want both, and how many servers you want to put into the system.

You have not mentioned anything about running out of CPU or I/O horsepower on your small machine. 16GB/1TB/8-core machines are about the minimum you can buy today; that give you a lot of growth over what you have.

Also, keep in mind that there is a lot of complexity in building and maintaining NDB. Plus code changes may be necessary.

"2gb ram 2CPUs 40GB" -- Do you run a junk yard?