SQL Server – Better Parallelization Than a Sequence Container in SSIS 2008

sql serversql-server-2008ssis

We currently have a process that has to process hundreds of thousands of rows an hour. For valid reasons, it has to call a stored procedure for each row. We're using a SSIS package (SQL Server 2008) to run this in parallel. If there's 400k rows to do, we divide that by 10, then tell each SQL Task within the container (we have 10 tasks, so that each one basically monopolizes a CPU) to do rows 1-40000, 40001-80000, etc. Naturally, not all rows are the same, so we frequently have 8 of the 10 SPIDs idle, waiting for the last 2 to finish, so that the job can start again and process the rows that have come in since it started processing.

There has GOT to be a better way to do this. I can't move this to SSIS 2012/14 yet, and it's CPU bound so I can't run the SSIS package on a different box (plus, overhead would probably be nightmarish). Also assume I'm not doing it in Service Broker.

One idea we'd had is to save each block of 1000 as a row to a table, then have multiple jobs running. Each would process a block of 1000, then claim a new row (and thus the next 1000) when they finish. No need for SSIS, either. But I'm unsure how the overhead/development cost of doing that would compare to a simple design change in SSIS.

Thanks in advance!

Best Answer

Assuming I understand the situation, you have lots of source data to process but it all needs to go through a slow destination (stored procedure) then there's a native component, Balanced Data Distributor, that might be of interest.

As a row flows into a BDD, it can detect whether the downstream consumers of data are busy (whether there is backpressure). It's somewhat similar to a Multicast in that you can have multiple output paths but you won't have duplicate data flowing into them. Instead, it'll be using backpressure detection to know whether a worker is available. That should address the issue you're encountering with your manual segmentation of the problem.

It doesn't come preinstalled with SSIS so you do have to work through that but it is a MS supported component.