Sql-server – Is this work flow suitable for Amazon RDS

amazon ec2amazon-rdsawsrsql server

I am considering to set up a MySQL database via amazon web services (AWS). Consider the following steps which characterize my workflow:

  1. There is a daily data inflow into several AWS EC2 instances
  2. On each EC2 instance, incoming data is stored in a RData file (tabular format)
  3. RData files from all EC2 instances should be exported as tables to a central database (file size is relatively small, less than 10 MB per file)
  4. Using R/RStudio, data cleaning and data aggregation routines need to be performed on the central database
  5. All steps must be automated via cron jobs

Steps 3 and 4 are my main concern.

Is this a standard work flow which can be integrated easily with Amazon Relational Database Service (RDS)?

Or should I consider a different approach (for example, running the SQL database on a separate EC2)?

Best Answer

The difference in RDS and MySQL on EC2 is that RDS is a managed deployment of MySQL (you can't change server settings) but not vanilla MySQL either.

So if you do not need to alter MySQL's server settings, RDS would be fine. Mostly due to it being EBS backed (Elastic Block Storage) EC2 has ephemeral storage (not persistent) unless you install it with an EBS backed volume (for all your data files) which is a best practice.

if you need to process the data in some other way, rather than just transfer and store it you will need a EC2 instance anyway. So, you could do it all on on EC2 instance, or separate the concerns.

As far as the 10MB data transfers you will be fine and potentially all free on Amazon's network, although if you start using their other services for data processing in and out of their network, make sure to have all your resources in a Availability Zone together, as data transfer costs go up from free to $/GB->TB a month you may be transferring across the wire.

Check their free tier out, you could possibly do all this for 12 months and never spend a dime.

EDIT: To correct myself slightly, you can't change ALL the server variables but, you can change some of the more important ones still on RDS. RDS Server Variables Article

Good luck.