Sql-server – Doing lookup transformation

sql serverssis

I have a contactTypeDescription and I want to find the corresponding contactTypeId for that description. I am doing the following, however it throws a warning saying that atleast one column must be mapped. What am I doing incorrectly?

enter image description here

Best Answer

Left hand item is the "Available Input Columns". That's the data in your pipeline.

The right hand item is the "Available Lookup Columns" - that's the reference data.

When you click the check button, that adds the lookup entry into the pipeline. Thus, you are adding ContactTypeId as a new column and you are replacing ContactTypeDescription.

That's likely not what you want to be doing.

Instead, you want to click and drag the ContactTypeDescription column from the left side to the matching column on the left. You then want to unclick the ContactTypeDescription from the right hand side because, why replace the value with the same thing?

That will tell the SSIS engine that a match is performed by these two things being equal.

Here comes the nuances of the lookup.

Depending on whether you specify a Full Cache (default) or a Partial/None cache, will determine what matching rules are used. Full cache is going to be case sensitive as it uses .NET equality. Partial/None is going to rely on your database rules (generally case insensitive in my world).

The choice on the Connection tab of selecting a table is fine for small data sets but if you don't need LastModifierIdentityId, why have the source system read it from disk, stream it across the network, have SSIS cache that in memory if you're never going to use it? Instead, write a simple SELECT there. SELECT T.ContactTypeId, T.ContactTypeDescription FROM dbo.Table AS T; Now you're only pulling the data you need, the source system can possibly use an index to satisfy that query, there's less traffic over the network and you're not bloating your lookup cache.

Speaking of bloating, the Full Cache, which is generally what you want, does have an impact on the start time of your package. Before it begins executing, all the requested data from the source system must be brought into the local memory before the first bit of data starts moving. That's a great strategy if you're loading a 50 GB file and you need to compare it against 1000 possible codes. On the otherhand, when you're loading trading data and the source is 1000 rows and your reference table is 50 GB, that's a sizeable delay before your package can even start processing. In a situation like that, you might be interested in a partial/none cache.