Amazon-rds – Do I need SQS queues to store remote data in the Amazon Web Services (AWS) cloud

amazon ec2amazon-rdsamazon-web-services

My first question is, do I need SQS queues to receive my remote data, or can it go directly into an Amazon cloud storage solution like S3 or EC2?

Currently, my company uses a third-party vendor to gather and report on our remote data. By remote data, I mean data coming from our machines out in the wilderness. These data are uploaded a few times each day to Amazon Web Services SQS queues (setup by the third party vendor), and then the third-party vendor polls the data from the queues, removing it and saving it in their own on-premises databases for one year only. This company only provides reporting services to us, so they don't need to store the data long-term.

Going forward, we want to own the data and store it permanently in Amazon Web Services (AWS). Then we want to use machine learning to monitor the data and report any potential problems with the machines.

To repeat my first question, do we need SQS queues to receive this data, or can it go directly into an Amazon cloud storage solution like S3 or EC2?

My second question is, can an SQS queue send data to two different places? That is, can the queue send the data to the third party vendor, and also to an Amazon Web Services database?

I am an analyst/data scientist, so I know how to use the data once it's in a database. I just don't know the best way of getting it into a database.

Best Answer

No SQS is not required. SQS is a publicly available service that only requires an authentication token in the request header. You can grant that using an IAM user account with security keys or if the calling service is another AWS account you can grant access by IAM roles. S3 is also a public service that works similar to SQS with regards to access. The process for setting up cross AWS account access is a bit different but the concept is the same. You also can send data directly to an EC2 instance by either assigning it a public IP address in a public subnet or by using an Elastic Load Balancer (ELB).

SQS is a queue service that only holds onto the data for another service to pull down. However if you are looking to have a service broadcast a dataset to multiple destinations you are looking for SNS. SNS will allow you to send to multiple destinations and each destination can have its own type of delivery. For example if you posted a message to SNS you could have it deliver that message to a lambda function in AWS and a web endpoint that accepts JSON.

The only caveat to all of this is that SNS takes the message it receives and sends it on to the destinations without any transformation so each destination will receive the exact same message and be responsible for extracting the data.