I'm frequently getting errors along the lines of
Error: 0xC020901E at Build Test Data, Lookup ID String[120]: Row yielded no match during lookup.
To resolve these, I'm going back to the source and making intelligent guesses, but it would be far easier if I could see the row that fails.
What's the best way to do this in SSIS (using MSSQL2012)?
Thanks.
Best Answer
There's nothing I've encountered that provides out of the box capability for this and I'll be delighted if someone proves me wrong.
Therefore, what I do is a controlled failure.
My lookup against DimEmployee should always yield a match. Data was loaded to that table before this dependent package executed. There's no opportunity for source data to change betwixt that load and but I was still running into situations where the lookup yields no match. And no, it's not a late arriving dimension, just they had a poor understanding of what the requirements were.
At any rate, what I do is set no match to "Redirect Rows to No Match Output". For those of you on 2005, you'll have to make do with the "Redirect Rows to Error Output"
I then count how many rows flow out of the failed lookup because if there's one failure, there might be more. That's the downside to even if the lookup could capture this row failed - it's only going to show you the first row and I generally want to know all the failures.
Controlled failure
I use a script task for this and as I type this up, I can see how this could be made into a re-usable component... My Script Task acts as a Transformation but I could have just as easily specified a Destination. I map in the columns that I used in the lookup and yielded no match. For this example, I have an
EmployeeID
and theirEffectiveDT
What I'm going to do in this component is fire a Warning event for every row that comes through. The default for 2012 projects run in the SSISDB will capture warning events. Once all the rows have gone through, in my
PostExecute
method, I'll raise theError
event which will cause the DataFlow as a whole to fail.I then run my package on the server and observe the execution/operation id. If you go into the All Executions report in SSMS, this is the value in the first column, ID. In this case, I see 938 so I then run the following query in the SSISDB
That's going to give me all the relevant bits I'd see in the "All Messages" report but with the advantages of I can click and select things and I've already parsed out what I need.
Running in interactive mode
If you're running this from the context of Visual Studio/SSDT/BIDS, I wouldn't even worry about the script's output, just add a Data Viewer and then you don't have to worry about crawling through logs.
Alternative to modifying your package (2012+)
The 2012 release of SSIS gave us a new tool for troubleshooting already deployed packages: Data Taps. These are a per-execution mechanism that allows you to generate a CSV (only) extract at a particular point in a Data Flow. This is amazingly helpful in highly regulated environments (SOX/SAS 70/HIPPA/PCI) where you can't just deploy an updated package without signoff from everyone on planet Earth. Just run the package with a special bit set and it'll generate an CSV in a well known location.
For the referenced screenshot above, if I wanted to see all the source data that was going to hit "LKP DimEmployee", I'd identify my PackagePath:
\Package\DFT Load FactJobAction
and my IdentificationString:Paths[ALL DateSK.Union All Output 1]
and then use the script in Josh's article