Mysql – Should I use DynamoDB or stick with MySQL via webservice for the mobile app syncs

MySQL

I asked this earlier at Stack Overflow but a user advised me to ask it here since this website is more relevant to database related questions.

I made an Android app, name of which I'll not mention at this stage. It syncs with a central MySQL database via a webservice on LAMP, which also I created. It all works just great, no problems whatsoever. As the users use the app and perform certain actions, data from their devices gets uploaded to this central server which is running in a data center. As other mobile devices syncs with the database, they get updated with the newer data. At the initial install, app gets a few hundred K of data downloaded via few hundred queries.

Now the problem is that the app has started to get so many downloads that I can foresee I'll not be able to keep up with the sync. For a while I have been optimizing the code both on the Android end and on Web end (with iPhone version still under development), and also using caching techniques, but still, the communication between thousands of mobiles devices and one single database is reaching its limits. Everyday the number of downloads are more than previous day. Now I am extremely worried how to handle this situation.

I have been looking into using Amazon DynamoDB, have also went over its SDK, tutorials, etc. and am ready to spend a few days and switch this whole sync thingy to Amazon. But this will eventually cost me heavily, for which I'll have to charge for the app or use a better advertisement scheme. But that's another issue.

Unfortunately Googling for weeks hasn't helped me on figuring out how to efficiently handle large syncs with thousands of mobile devices, which might get into tens of thousands or probably hundreds of thousands.

In short, what to do to, lets say, deal with 100K devices using my app and in need of keeping in sync with a central database?

Kindly answer if you have faced a similar situation and know how to deal with it in real life.

Thanks in advance.

Best Answer

I don't have experience with DynamoDB so the best I can do is offer some pointers on optimizations to look for if you stay in the hosted MySQL world.

  • Memory allocation. If you're using innodb be sure you have adequate buffer pool size with two log files as the lesser of 25% buffer pool size or 1 Gig. In a perfect world your buffer pool would be as large as all the .ibd files (aka sum(data_length+index_length) from information_schema.tables where engine='innodb'). Understandably this is not always possible, in which case try to allocate 60-70% of your system memory to buffer pool while leaving the rest available for disk cache. (This of course assumes this is a dedicated MySQL server not hosting apache or running your php scripts).

If you're running MyISAM tables then at least make sure your key_buffer_size can accommodate the index_length (or size of your .MYI files).

  • Disks. If you're needing to fall back to disk you want those to be as fast as possible. In a perfect world you'd have PCI-E based SSD cards. I imagine this is out of your budget if you're worried about charging after switching to amazon, so maybe see if your hosting provider has any options above what you currently have. SSD based SAS drives > 15k hdd sas > nl-sas.

  • Service layer between mysql and clients. There is memory overhead to each connection to the database that could be used toward buffer pool or disk cache. Try to have a service layer that maintains a bounded connection pool and providing application layer caching where possible.

  • Distribute the work load. This is tantamount to going to the "cloud" just that you're managing all machines yourself. If your app is mostly read heavy and not always required to be up to the microsecond up to date you can setup replication between many physical machines and load balance the reads across them.

Look toward getting slaves setup at different physical datacenters across the country/world/what ever's applicable to your demographic. Load balance connections to the nearest one. (This assumes traffic is first routed to your front end in these locations otherwise the problem might just be worsened).

  • Partitioning. This is especially true if you're using myisam and dealing with table level locks. It's also particularly useful if you have a table that has patterns of pruning old rows periodically (you can just drop partitions instead of running individual delete queries which fragment the data).