Data Flow
Here's the general approach I'd take to solving your problem.
I started with your source data and added some other conditions - a NULL as well as a 14 and 15 year to ensure my logic later is correct.
SELECT
D.DrvDOB
FROM
(
VALUES
('470324')
, ('470324')
, ('470209')
, ('140209')
, ('150209')
, ('101')
, ('0')
, (NULL)
) D(DrvDOB);
This is going to seem like a lot of work, but for me, this is the most maintainable approach. If something's wrong, I can simply inspect between the derived columns and find the failing bit of logic. Many people mistakenly believe the amount of components on a data flow decrease the processing time. The reverse case can actually be true, see the sqlblog post at the end for more details.
Create a column called IsValidLength
This is going to provide us a boolean to tell us whether we even need to try and parse the year, month and day from the string. I'm ensuring we don't have a NULL value and that the length is 6. Otherwise, we're going to flag as invalid
!IsNull([DrvDOB]) && LEN([DrvDOB]) == 6
I then am going to generate 3 more string columns (der_Year, der_Month, der_Day) with the parsed year, month and day. Otherwise, I'm going to use 00 as my value. Again, I only attempt to parse the string if it's valid.
(IsValidLength) ? SUBSTRING([DrvDOB], 1, 2) : "00"
(IsValidLength) ? SUBSTRING([DrvDOB], 3, 2) : "00"
(IsValidLength) ? SUBSTRING([DrvDOB], 5, 2) : "00"
Now comes the actual fun part, determining the actual year. Since your field is called date of birth, I'm assuming there are no future dated records - my apologies if you have psychic data. I took the route SQL Server, .NET library and COM use and have a prescribed year to use as the marker between what gets tagged as 19XX vs 20XX. I create an SSIS Variable called TwoDigitYearCutoff
and assign it a value of 2014. In short, any years that are parsed from 00 to 14 will be tagged with the 21st century, while the rest are left in the 20th century. Adjust this value as needed.
Back in my data flow, I create yet another column, this one der_ActualYear which will be the 4 digit year.
(@[User::TwoDigitYearCutoff] / 100
- ((DT_I4)der_Year > @[User::TwoDigitYearCutoff] % 100 ? 1 : 0))
* 100 + (DT_I4)der_Year
At this point, we know the full year, the month and day along with whether this data is any good. The final piece then is to put it all together and make it an actual Date. I build out a YYYY-MM-DD string and cast that as a date. Otherwise, I cast NULL as a date (to get my metadata consistent)
(IsValidLength) ? (DT_DATE) ((DT_WSTR, 4) [der_ActualYear]
+ "-" + [der_Month] + "-" + [der_Day]) : NULL(DT_DATE)
You can now wire der_DrvDOB into your target destination knowing that you have a valid date or a NULL.
Results
As expected, valid dates or null
Biml
Biml, the Business Intelligence Markup Language, describes the platform for business intelligence. Here, we're going to use it to describe the ETL. BIDS Helper, is a free add on for Visual Studio/BIDS/SSDT that addresses a host of shortcomings with it. Specifically, we're going to use the ability to transform a Biml file describing ETL into an SSIS package. This has the added benefit of provding you a mechanism for being able to generate exactly the solution I'm describing versus clicking through many tedious dialogue boxes.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<!-- 74383 -->
<Connections>
<OleDbConnection ConnectionString="Provider=SQLNCLI11;Data Source=localhost\dev2014;Integrated Security=SSPI;Initial Catalog=tempdb" Name="CM_OLE" />
</Connections>
<Packages>
<Package Name="dba_74383" ConstraintMode="Linear">
<Variables>
<Variable DataType="Int32" Name="TwoDigitYearCutoff">2014<Annotations>
<Annotation AnnotationType="Description">Use the two digit year cutoff to specify an integer from 1753 to 9999 that represents the cutoff year for interpreting two-digit years as four-digit years</Annotation>
</Annotations>
</Variable>
</Variables>
<Tasks>
<Dataflow Name="DFT Make Dates">
<Transformations>
<OleDbSource ConnectionName="CM_OLE" Name="OLE_SRC Gen data">
<DirectInput>SELECT
D.DrvDOB
FROM
(
VALUES
('470324')
, ('470324')
, ('470209')
, ('140209')
, ('150209')
, ('101')
, ('0')
, (NULL)
) D(DrvDOB);</DirectInput>
</OleDbSource>
<!--
Test whether we have minimally valid data
-->
<DerivedColumns Name="DER IsValidLength">
<Columns>
<Column Name="IsValidLength" DataType="Boolean"><![CDATA[!IsNull([DrvDOB]) && LEN([DrvDOB]) == 6 ]]></Column>
</Columns>
</DerivedColumns>
<!--
Extract the year
-->
<DerivedColumns Name="DER ExtractYear">
<Columns>
<Column Name="der_Year" DataType="String" Length="2">(IsValidLength) ? SUBSTRING([DrvDOB], 1, 2) : "00"</Column>
</Columns>
</DerivedColumns>
<!--
Extract the Month
-->
<DerivedColumns Name="DER ExtractMonth">
<Columns>
<Column Name="der_Month" DataType="String" Length="2">(IsValidLength) ? SUBSTRING([DrvDOB], 3, 2) : "00"</Column>
</Columns>
</DerivedColumns>
<!--
Extract the day
-->
<DerivedColumns Name="DER ExtractDay">
<Columns>
<Column Name="der_Day" DataType="String" Length="2">(IsValidLength) ? SUBSTRING([DrvDOB], 5, 2) : "00"</Column>
</Columns>
</DerivedColumns>
<!--
Guess at year based on cutoff date. Assume historical data and not future dated
since this is DOB. For supplied value of 2014,
if 14 is seen, it will be 2014
if 15 is seen, it will be 1915
http://technet.microsoft.com/en-us/library/ms189577(v=sql.105).aspx
A two-digit year that is less than or equal to the last two digits of the cutoff year is
in the same century as the cutoff year. A two-digit year that is greater than the last
two digits of the cutoff year is in the century that precedes the cutoff year. For example,
if two digit year cutoff is 2049 (the default), the two-digit year 49 is interpreted as 2049
and the two-digit year 50 is interpreted as 1950.
https://stackoverflow.com/questions/2024273/convert-a-two-digit-year-to-a-four-digit-year
this.TwoDigitYearMax / 100 - (year > this.TwoDigitYearMax % 100 ? 1 : 0)) * 100 + year
@[User::TwoDigitYearCutoff]) / 100 - ((Int32) [der_Year] > @[User::TwoDigitYearCutoff]) % 100 ? 1 : 0)) * 100 + (Int32) [der_Year]
-->
<DerivedColumns Name="DER Guess at year">
<Columns>
<Column DataType="Int32" Name="der_ActualYear"><![CDATA[(@[User::TwoDigitYearCutoff] / 100 - ((DT_I4)der_Year > @[User::TwoDigitYearCutoff] % 100 ? 1 : 0)) * 100 + (DT_I4)der_Year]]></Column>
</Columns>
<Annotations>
<Annotation AnnotationType="Description">This might be overkill, but it sure was fun</Annotation>
</Annotations>
</DerivedColumns>
<!--
Put it all together
-->
<DerivedColumns Name="DER Make valid date">
<Columns>
<Column Name="der_DrvDOB" DataType="Date">(IsValidLength) ? (DT_DATE) ((DT_WSTR, 4) [der_ActualYear] + "-" + [der_Month] + "-" + [der_Day]) : NULL(DT_DATE) </Column>
</Columns>
</DerivedColumns>
<DerivedColumns Name="DER bitbucket">
</DerivedColumns>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>
References
Best Answer
Assuming dates in other months are truncated to 3-char months:
Once you've verified, you can drop the varchar_col, rename newcol, and then re-add any affected indexes.