Cloud computer to speed up sql big data queries

cloudperformance

i have been doing a lot of data mining at my new job.
we collect TONS of data, and it's my job to visualize that sea of information in real time with statistic/heuristic analysis.

we're currently using mysql on a dedicated virtual server.
i'm commonly getting the sum of 100,000+ rows in a single query. and let me say, that some of my more complex multi-table queries are taking 14sec+ 😛 i'm constantly trying to optimize my sql routines to increase performance, but there's only so much i can do.

my boss keeps asking me if we need to move our data to the cloud. and honestly, i'm not sure. i have read a lot about "how great it is", "how fast it is", etc… but it's hard to tell what's marketing and what's legit.

i have also been thinking about caching data. i have done a bit of this in the past and can see a real benefit from it. but i'm not really sure how applicable it is in my situation. we're getting live data all the time, and give the users the ability to mine that data in a verity of ways. (e.g. hourly, daily, weekly, monthly, quarterly, yearly, all time, or a user defined custom time range) if you looking at a data-set from say last month, that data is finalized. there will be no new data added to it ever. but for this month, new transactions are happening every moment, and i'm not sure if i can even cache that type of thing… sigh

i'm not opposed to moving away from sql either. it's just the model we're currently using. but as our company grows (thus the volume of data read+write) i want to continue to offer a very fast service for both the businesses using it as well as their users.

so can i get some feedback (and suggestions) from real world users?
cloud sql? or using/creating some caching engine?

Best Answer

You've asked a question which often results in real projects (and I freely admit to being a consultant in this area).

I'm just going to give you some things to think about:

  • If you are getting your results from the cloud, you have to get your data there. This may involve additional lags.
  • How real-time is real-time?
  • Do you have programming expertise in the group/company your work for? For complex data manipulation tasks, Hadoop (and its derivatives) is a cloud-based technology that works well.
  • Once you've found something of interest, do you have a way to "productionize" it, so updating the results does not waste analysts' time.
  • The cloud (such as AWS) can provide very scalable hardware. However, hardware is pretty cheap. Part of the question is the balance between expenses versus capital investment.
  • Are you tied to MySQL? Other databases have better functionality for complex queries on large data sets.