Sql-server – ny way to tell which derived column failed

sql serverssis-2012

In an SSIS dataflow task we have a derived column transformation with approximately 100 columns (basically converting raw input string data into typed variables). When this task fails, is there any way to tell which column caused the failure, for logging purposes? The only other alternatives I can think of are a custom script task to perform each conversion individually (yuck) or a separate derived column transformation for each data point (double yuck).

Basically I just want to be able to re-direct failure rows and know why they failed.


So an example. Our package is being used to allow users to bulk-upload to our database using Excel spreadsheets. So let's say the spread sheet coming in looks like this (except there's hundreds of columns):

+--------+-----------------+---------+------------+---------+
| Text1  |     Number1     | Number2 | DateTime1  |  Text2  |
+--------+-----------------+---------+------------+---------+
| Spring | 1               |       1 | 1/1/0001   | Flowers |
| Summer | 2               |       2 | 6/1/2015   | Sweaty  |
| Fall   | N/A             |       3 | 10/31/2099 | Crunchy |
| Winter | This is garbage |       4 | 12/12/2020 | Icy     |
+--------+-----------------+---------+------------+---------+

In this instance we want Spring, Summer, and Fall to succeed. Fall with have a null value for Number1. The derived column will have logic that looks something like this (not valid syntax, just the logic)

sanitizedNumber1 = Number1 == "N/A" ? null : cast(Number1 as int)

Winter will be redirected down the error path and logged. Is there any way to know which derived column failed? Again, we have about 100 inputs that are being processed in this transformation in a similar fashion. I'd like to be able to log something like:

Import record "Winter" failed due to invalid data in "Number1"

It doesn't necessarily have to be this format, but anything that would allow a user to be able to uniquely identify the bad data point would be acceptable. I know this would be possible using a script component and performing the conversions manually (which is what we're going to have to do if there's no better option) but if it's possible to just modify the Derived Column Transformation to provide something along these lines I'd rather do that instead of re-implementing the entire component in a script.

Best Answer

You have to manually identify the failures.

Redirect the rows (as you currently are) into a table that matches your columns, with the single addition of an IDENTITY column. This table should be all VARCHAR data types, so that you retain all original values.

Now you can identify your failures using TRY_CAST when querying the table.

For example:

SELECT
Identity,
TRY_CAST(Winter As INT) As WinterConverted,
Winter
--------------------------------------
Identity  |  WinterConverted   | Winter
   1      |    NULL            | NaN

TRY_CAST failed to cast the value in Winter column to an int leaving you with a null. You can do this for all columns.

If it's worth your time, you could make a more complex query to return each column name that contains a NULL. If not, you should be able to relatively easily visually inspect.

Of course, if most of your errors are 'NaN' or 'N/A', there's a good chance you're wasting time identifying errors you already knew about. Cleanup the issues before they become one:

Winter == "NaN" || Winter == "N/A" ? -1 : (DT_I4)Winter

As a side note, you can add a TRY_CAST to 100 columns very easily by using vertical editing: hold Alt+Shift in SSDT or SSMS (some other apps too) and then or , then edit hundreds of lines at the same time.