Sql-server – Insert streamed RFID data

memory-optimized-tablessql serversql-server-2016

I have RFID readers currently writing data to files on a hard drive, which then SSIS reads and inserts into the database. Instead, I'd prefer that SQL Server (2016+) read the data stream directly into a Memory Optimized Table to increase performance and handle scalability.

What is the best method to do this, handling multiple readers sending data all at once, and making sure no data gets lost due to bottlenecks, deadlocks, etc.?

Each reader can handle approximately 60 reads per second, then transmit via wi-fi (or later via USB cradle). So per minute, we're talking 60x60x512bits of data per reader. For future, assume in large warehouses that there could be 10+ readers going at once.

There is simultaneous processing and reading of the data. While the reader data is coming in, it needs to be checked for inconsistencies, redundancies, de-duped, updated (e.g. if a product location has moved), etc. Then, queries / reports will be made against the DB – various user searches, inventory and location reports, analysis.

Also handling of bottlenecks due to simultaneous delivery of data by readers – i.e. say 5 readers were delivering their daily data at the same time through cradles, while the DB was doing various other tasks.

Best Answer

How confident are you that max concurrent connections will stay at 10 in the future?

If you're interested in raw speed - which is part of the domain of In-Memory OLTP - then transmitting via wifi will likely be a bottleneck. Even if you're mainly interested in scalability of the write workload - wifi or USB cradle will have its limits - preventing the full (potential) benefit of using In-Memory OLTP.

While Tony Hinkle is correct that an IDENTITY is more "efficient" than a GUID, clustering on an IDENTITY is an anti-pattern for highly concurrent, write-intensive workloads. It's exactly this anti-pattern that In-Memory is meant to solve, but consideration of using In-Memory OLTP should start a dialog that takes into account all of the many gotchas, i.e. no db mirroring, no snapshots, can never remove the memory-optimized filegroup, etc.

Queries can produce different results when using any form of snapshot isolation, so you'd have to verify that the way your app works can handle that.

So while the volume of data to be inserted is not particularly high, if you throw enough concurrent connections at it, scalability will be an issue. This is easily demonstrated with ostress (as long as you don't output to files). As concurrency ramps up, say from 100 to 500 connections, INSERT performance simply crawls. This is due to having to latch the last page of the clustered index, which essentially serializes all writes to the table.

Given what you've described, In-Memory OLTP might be best used like a "shock absorber", as a landing pad for your data. Then you could validate it, before sending it to its final destination. If that final destination is another memory-optimized table, be advised that all writes to a memory-optimized table are serial, no matter what the source. So you would have to roll your own parallelism, and ensure that your processing does not step on its own toes. You'll essentially have a queue, and due to its lock-free architecture, In-Memory OLTP is poorly suited to queue processing. There must be some form of serialization to make sure your concurrent processes don't attempt to process the same set of rows.

All traditional/disk based tables can suffer from deadlocks, which can usually be reduced, and perhaps eliminated, but there is no guarantee of that. Deadlocks are not possible with memory-optimized tables, due the lock/latch free architecture (same for indexes on memory-optimized tables).

Queries will probably not be any faster with In-Memory OLTP, as it's not designed for that.

I have blogged extensively about In-Memory OLTP. If you are seriously considering deploying it, I strongly suggest you study what I've written.