I need to associate customers' order with their "level" (Silver, Gold, etc.) when they placed the order:
CRM server::CRM db::CRM table
----------
CustomerID PreviousLevel NewLevel NewLevelGrantedOn
Order server::Order db::Order table
----------
OrderID CustomerID OrderPlacedOn
In an SSIS package I did this:
- Execute SQL against Order db to extract the orders and put them in an object variable;
- Loop through each order using "foreach container", in which I put a data flow task
select top 1 * where CustomerID = ? and LevelGrantedOn < ? order by LevelGrantedOn desc
that extract data from CRM db (both parameters come from step one), derive some columns and write the output to another table.
There are more then twenty thousand records in the order db, which means the data flow task will be executed for more than twenty thousand times. The CRM db will be queried for more than twenty thousand times, too. It takes more than an hour to do these.
Can I utilize some built-in features to speed up these (or do it in a "smart" way)? And, is an hour a long time, in the context of ETL and / or SSIS?
Best Answer
You can achieve this without using Foreach enumerator, just use one data flow task to achieve this.
Building the Package
First of all, add a Data Flow Task to the control Flow
1. OLEDB Sources
In the DataFlow task add an
OLEDB Source
that Read from Orders Table (the same command used in theExecute SQL Task
(first step in your question)Also add a second
OLEDB Source
that read from customer table:2. Sort
After each OLEDB Source add a source component:
CustomerID
andOrderID
columns for sorting and select the sort type as ascendingCustomerID
(sort type = ascending) andLevelGrantedOn
(sort type = descending) columns for sorting3. Merge Join
Add a merge Join component to join both sorted outputs. and select
CustomerID
column from both outputs as Join Key and select the output columns you need from both tables4. Conditional Split
After merge Join add a conditional split to filter rows only that match the following expression
5. Script Component
The final component we have to use is a script component to get only the first row for each customerID (because both source are well sorted just grabbing the first row is similar for
Select top 1 ... ORDER BY LevelGrantedOn desc
)In the script component add an Output Column
OutFlag
of typeDT_BOOL
and Use the following script:This script will set OutFlag to True when the customerID occur for the first time (similar to
TOP 1
)6. Conditional Split
Add a second Conditional Split to filter Top 1 rows:
7.OLEDB Destination
Connect the Conditional Split Output top the OLEDB Destination and map the columns.
The Dataflow task must looks like