Sql-server – SSIS data flow to update source table rows after copying to destination

sql serverssis

I have a simple data flow that copies a subset of data from a source table on an internal database to a table on web-facing database.

If there is a problem, the error is outputted to an errors table.

That's all fine.

In the source table there is a bit column for SSIS_TRANSFERRED that I wish to set to 1 when the copy process completes. However, I'm unsure how to approach this.

My instinct is to craft an SQL Statement that runs against each Unique ID for every row successfully transferred as part of that package – is there a simple approach to this (i.e. as part of the data flow) or do I need to create a new Control Flow with OLE DB Command that queries the web-facing table and marks the corresponding internal rows as 'transferred' accordingly?

Best Answer

If you want to keep all the components within the current Data Flow Task, then you could add a Multicast within this Data Flow Task, with one output to the destination and the other output to an OLE DB Command that updates the source records based on the rows transferred to the destination. However the Multicast transformation is a synchronous transformation, thus the records will go to both outputs simultaneously and this could lead to blocking or deadlock issues. A simpler approach may be to just add an Execute SQL Task after the Data Flow Task that updates the source table based on the transferred records from the destination table. For the update, you'll want to do a set-based update, such as the SQL statement below. To only perform this update based on rows transferred within that package execution, a Multicast could used in the Data Flow Task to output only the unique IDs to a staging table, and then update the source based on the matching IDs in the staging table in the subsequent Execute SQL Task. Just make sure to add a step to truncate the staging table in the beginning of the package to clear data from the prior execution.

UPDATE SRC
SET SRC.SSIS_TRANSFERRED = 1
FROM dbo.SourceTable SRC
INNER JOIN dbo.DestinationTable DEST 
ON SRC.ID = DEST.ID

SSIS Design considerations

Generally speaking, I try to make my packages focus on solving a single task (load sales data). If that requires 2 data flows, so be it. What I hate inheriting is a package from the import export wizard with many un-related data flows in a single package. Decompose them into something that solves a very specific problem. It makes future enhancements less risky as the surface area is reduced. An additional benefit is that I can be working on loading DimProducts while my minion is dealing with loading SnowflakeFromHell package.

Then use master package(s) to orchestrate the child work flows. I know you're on 2005 but SQL Server 2012's release of SSIS is the cat's pajamas. I love the project deployment model and the tight integration it allows between packages.

TSQL vs SSIS (my story)

As for the pure TSQL approach, in a previous job, they used a 73 step job for replicating all of their Informix data into SQL Server. It generally took about 9 hours but could stretch to 12 or so. After they bought a new SAN, it went down to about 7+ hours. Same logical process, rewritten in SSIS was a consistent sub 2 hours. Easily the biggest factor in driving down that time was the "free" parallelization we got using SSIS. The Agent job ran all of those tasks in serial. The master package basically divided the tables into processing units (5 parallel sets of serialized tasks of "run replicate table 1", table 2, etc) where I tried to divide the buckets into quasi equal sized units of work. This allowed the 60 or so lookup reference tables to get populated quickly and then the processing slowed down as it got into the "real" tables.

Other pluses for me using SSIS is that I get "free" configuration, logging and access to the .NET libraries for square data I need to bash into a round hole. I think it can be easier to maintain (pass off maintenance) an SSIS package than a pure TSQL approach by virtue of the graphical nature of the beast.

As always, your mileage may vary.

Sql-server – SSIS Data Flow Task Excel to SQL table NULL value will not work for small INT datatype

My original answer was working (changing int to nvarchar) but I ran into another column which contained dates in the excel source file where some cells contained the string "NULL" (I did not want to have dates in a NVARCHAR column.) When SSIS got to this date column it was generating an error because it could not convert the string "NULL" to a date. It was not properly reading the "NULL" as a NULL value, instead it was reading it as a string. The way I was able to resolve this issue is by adding a derived column component to the package that replaced the date field by using the following expression

(DT_WSTR,255)Auth_from_date == "NULL" ? NULL(DT_WSTR,50) : (DT_WSTR,255)Auth_from_date

this is just an IF statement that will change the string "NULL" to an actual NULL value. The date is then passed thru a data conversion component and converted into DATETIME format.

Best Answer

Related Solutions

Sql-server – ETL: extracting from 200 tables – SSIS data flow or custom T-SQL

SSIS Design considerations

TSQL vs SSIS (my story)

Sql-server – SSIS Data Flow Task Excel to SQL table NULL value will not work for small INT datatype

Related Question