Since no one else has answered, I'll share some opinions and do some hand-waving.
As long as you aren't locking common resources, or are locking resources in the same order, you shouldn't have problems with deadlocks.
I'd look at separate tables before separate databases. Each additional database will definitely cost you more but additional tables won't necessarily cost you more. You might need to use more than 1 database becuase of the sheer volume of data you will store or because of the rate at which you need to store your burst traffic. If you can manage it, I think that a table-level granularlity will be more flexible and possibly a good deal cheaper than starting with a database-level granularity.
The problem with putting each device's data into it's own tables is that it makes the reporting hard since all of the table names will be different.
I presume that you have some way of detecting when you get a "failure resend" of data. You don't want to put the same value in a table twice and I i'm sure that the devices can fail (local power failure?) in ways that have nothing to do with whether or not earlier values where properly stored.
WAG: Assuming each "value" is 4 bytes, I calculated about 11.5 MB of collected data per device, per day. (This ignores all kinds of stuff, like device identifiers and timestamps, but I think it is OK as a rough estimate.) So, with "thousands" of sites, we are looking at tens of GB, per day. You don't mention any kind of lifetime on that data. The largest Azure database currently maxes out at 150 GB. You could fill those up pretty quickly.
Getting anything to happen in a web browser in a short period of time is iffy. When you are reading from (possibly multiple) databases with GBs of data, continuously inserting lots of new data into the tables you are reading from and interacting with web servers across the open internet, "real time" is wishful thinking. IMO. "Fast enough" is the usual goal.
If you can't keep all of the data you need in a single report in one SQL Azure database, it's a problem. There are no linked servers or distributed views (at this point). There is no simple way to aggregate accross many Azure databases. You'd have to pull all of the data to a central location and report from there. I'd guess that the aggregated data would be too large to store in a single SQL Azure database, so you'd have to go to on-premise or maybe EC2. A data mart or warehouse with a star-schema structure would be the classic answer there, but that takes significant processing time and that means no "real time". Also, that's potentially a lot more data transfer from Azure to wherever it goes, and that will cost you.
I wouldn't commit to this strategy without a pilot program first. The first thing to do would be to build a single instance (can it handle 400 sensor values a second? (Is that a series of rows, a big denormalized row, an XML document or something else? The format of the incoming data will affect how fast the data can be stored. Can you do bulk inserts, or does it have to be row-by-row?) How about 4,000 sensor values a second? It's possible that an single SQL Azure instance might not be able to store that much that quickly.) and see how it handles insertions at your expected rates and see how the reporting might work. And I'd talk to Microsoft too. Just dealing with the billing for hundreds or thousands of seperate databases might be quirky.
I don't know if this is applicable to you, but have you looked at Microsoft's "Stream Insight" product? It seems to be aimed at situations like yours. Caveat: I've never used it.
The marketting blurb:
Effectively analyze large amounts of event data streaming in from multiple sources. Derive insights from critical information in near real time by using Microsoft StreamInsight. Monitor, analyze, and act on data in motion and make informed decisions almost instantaneously
While doing some quickly googling, I noticed a blog posting which states that StreamInsight available on SQL Azure as a CTP last year. it might be ready for prime time by now.
Good luck, it sounds like an interesting project.
Best Answer
We looked, and didn't find a packaged product that met our similar needs. We wanted to be able to get away from having so many developers with
dbowner
(and evensysadmin
) access to databases and instances.We ended up writing our own desktop application. It doesn't quite match your requirements, but it is close.
The user runs the application, which authenticates them via Windows authentication. There is a lookup that then determines which instances/databases they can request elevated permissions to. And how far they can elevate. All of which has to be approved ahead of time by their supervisor. There are four options for elevation level: (lowest to highest )
They also have to fill out a field to explain why they need the elevated permissions. The program then creates a temporary SQL login for them that they can use to do their elevated work. (We tried to do it by adding them to an Active Directory group, but something didn't work right. I don't remember what it was. We'll try that again some day.)
It logs the fact that they asked for elevated rights to a table that we can then run reports from. We also set up our third-party compliance auditing tool to specifically audit all activities from these temporary elevated logins. The process also sends an e-mail message to the person's supervisor whenever they request elevated access.
We then have a scheduled job that removes these temporary logins at 6:00 a.m. the next day.