Mysql – Porting a MySQL back-end to AWS DynamoDB

amazon-rdsawsdynamodbMySQL

I'm setting up a back-end for a mobile application using Amazon Web Services. I'm fine with S3 and Lambda, but about using DynamoDB I have some problems. Before using AWS, I was developing the core of the mobile app using a little server of my own for testing, with MySQL as database engine. There were two tables: one was used to keep track of registered users and retrieve login informations, but now I'm using AWS Cognito, that handles all that things, so I need no more a "Users" table; the other table, the one that is causing me troubles, is composed in this way:

With MySQL was pretty simple to obtain what i needed with some simple queries, however with DynamoDB even performing simple stuff immediately becomes an hell of Global Secondary (Costly) Indexes. The reason because I'd like to use DynamoDB is that I've already imported 3rd party frameworks to be able to use Amazon Web Services (AWS), so I wouldn't introduce other 3rd party stuff just for the database back-end (even if some database services can be used through a normal HTTP request I think). Furthermore, using DynamoDB from iOS and Android apps is very handy.

The operations that i need to perform on this table are the following:

1) Given a STRING, return ID, NAME and BRAND of every Item (Row) that contains STRING in the NAME attribute
(e.g. In the example table, if STRING = "Incredible", the returned Items will be those with NAME "My Incredible Name" and "A Name Very Incredible");

2) Return ID, NAME and BRAND of Items that satisfy a contraint on LATITUDE AND LOGNITUDE
(e.g. (LATITUDE > 25 AND < 26) AND (LONGITUDE > 16 AND < 17), needed to return Items with coordinates nearby the user position);

Not indispensable but would be very appreciated:

3) Given a STRING, return IDs of all items with USERNAME == STRING
(e.g. all Items' IDs where USERNAME == "TheLegend27");

From what I can read on the web (please correct me politely if I'm wrong), this is a perfect example of a strongly relational schema that works better in a relational database engine such as MySQL, that in one NoSQL based like DynamoDB. I would be really glad to use a MySQL based one, but the relational alternative that AWS offers is RDS. RDS seems perfect for this purpose, however (for my understanding) has a free tier only for the first 12 months (with a micro-instance), and the cost seems to me a bit high (considering the absence of a free tier after the first year). Even DynamoDB seems to be able to become very costly in a short time, but at least the free tier is there forever. If my traffic will be discrete but not huge, chances are that the free tier would be enough to grant me the service for free, while with RDS, I will pay in every case (again, please correct if I'm not understanding anything at all, I'd be very glad).
Furthermore, there is the scalability issue. Let's say my app will be an incredible success, I'll make a lot of traffic out of it and I'll have all the money to pay for the backend resources. In one year from now I'll have 500'000'000 users that continuosly put an incredible amount of data in my table. In theory DynamoDB would be a perfect choice against the scalability problems associated with relational databases, so RDS would suffer from my huge number of users where DynamoDB doesn't at all. So the conclusion seems to me that Dynamo is perfect on the performances side, but lacks that bit of flexibility that allows me to perform what in MySQL are pretty simple queries. My fear is that even if it's possible to make those queries with DynamoDB, the resulting number of Read and Write Capacity Units needed to use the additional Global Secondary Indexes will increase the cost of the service by a lot.

So, after this long monologue, my questions are these:

1) Can DynamoDB execute the queries exposed above? If Yes, I suppose that global secondary indexes are needed on more than one attribute. Please, can you say me EXACTLY, referring to my specific table showed above:

a – what I should set as "main" Partition Key and Sort Key (now I'm using the ID atribute as the only Primary Key, which is the only thing I suppose to be correct);

b – what Global Secondary Indexes to setup, with which Partition Key (and maybe Sort Key) and which attributes to Project on that Indexes;

c – what worries me the most, how many Read Capacity Units will each of the three provided queries need, and how many Write Capacity Units will adding an Item (Row) need.

2) If DynamoDB is not the way to go for these requirements, can you provide me some alternatives? Should I use RDS? Should I use a service external to AWS? Should I organize my project's backend in a different manner to be able to use DynamoDB (maybe using multiple tables or who knows what)?

I've just started to look at NoSQL databases, surely I'm not understanding their full potential. I'm stuck with this problem and I cannot end the project without some help on this.

I hope it's all clear, I've tried my best. If anything is confusing or not fully understandable, please ask me for clarifications. Thanks to all in advance for your help and time.

Best Answer

I will approach your question from this point of view "MySQL does the work; 3rd party software tends to 'get in the way'"...

1) -- MySQL's FULLTEXT index would make that query very fast and efficient. Using LIKE is slow because it requires a table scan.

2) -- There are many discussions of lat/lng; the short answer is two single-column INDEX(latitude), INDEX(longitude). That will be "good enough" for medium-sized datasets.

3) -- INDEX(username).

500M users is about how many Internet-savvy people there in the world. That's more than anyone can expect. Still, the table you describe would fit in a single server (but not a 'micro' instance). With millions of "active" users, you would need to consider sharding and availability zones, etc.

Seriously... You are at the beginning stages; it is too soon to worry about 500M, or even 5M. Within a few months, even with fantastic growth, you will discover that the schema, the UI, the features, the everything else, will need revising. Sooooo... Plan on a major overhaul within 6 months.

Meanwhile, focus on getting a working prototype going on cheap hardware, with simple techniques.

Back to my first comment... This (and other) forum is littered with questions about how 3rd party software can't scale. Often my answer is "Here's what you need to do in MySQL; can you reverse engineer it into the 3rd party constraints?".

I don't know what "Global Secondary Indexes" means. It smells like a index across shards. If so, then when and if you get to sharding, it is necessary. But... Is it "one size fits all" and is it the best for your application? (My first choice in sharding is a 'hybrid' between 'hashing' and 'dictionary lookup'.)

Not even 1 application in 1000 needs sharding. Build your prototype without sharding. As you do, keep in mind what will have to happen when you do shard.

The operations that i need to perform on this table are the following:

So, after this long monologue, my questions are these:

Best Answer

Related Solutions

DynamoDB: searching with primary key or range key, which way is faster

Sql-server – Migrating development and production databases to AWS

Related Question