SSIS Balanced Data Distributor – How It Splits Records

sql serverssis

I have been using the balanced data distributor control in SSIS, but am coming across an issue in which my parallel destinations (Salesforce bulk api via TaskFactory) are getting in lock conflicts. I was told by Salesforce support to try sorting such that there are no conflicting locks in separate batches.

How does the BDD control distribute? Does it split using fetch/limit, or does it do a modulus split across all records?

Best Answer

You don't really have much control over how the SSIS Balanced data distributer control distributes data, but you could influence the order in which the data by the BDD is received.

From the docs (emphasis mine):

This transform takes a single input and distributes the incoming rows to one or more outputs uniformly via multithreading. The transform takes one pipeline buffer worth of rows at a time and moves it to the next output in a round robin fashion.

The pipeline buffer size can be influenced by properties such as the DefaultBufferMaxRows and DefaultBufferSize or AutoAdjustBufferSize. Check the docs on data flow performance features for details on how it's all calculated.

So in short, You'll have a hard time figuring out exactly how many records are sent out each time the BDD switches to a new output.

Knowing how the BDD distributes data by just taking whatever is in the incoming buffer and sending it to an output destination you could maybe figure out ways to make sure the incoming data is sorted in a way that you can avoid locks later on, but remember that each flow after the BDD will probably have some processing in it too, and there is no way you can tell if at the end the data will be processed in the same amount of time by each path the data takes, so sorting it before the BDD doesn't guarantee it's still in the same order later on.