Below are the strings representing date of birth (yymmdd). I want to convert it into datetime format and load into a database using ssis. How can I do it? In my derived column I have (DT_DATE)(SUBSTRING([Drv DOB],1,2) + "-" + SUBSTRING([Drv DOB],3,2) + "-" + SUBSTRING([Drv DOB],5,2))
, but it's not working.
- 470324
- 470324
- 470209
- 101
- 0
Best Answer
Data Flow
Here's the general approach I'd take to solving your problem.
I started with your source data and added some other conditions - a NULL as well as a 14 and 15 year to ensure my logic later is correct.
This is going to seem like a lot of work, but for me, this is the most maintainable approach. If something's wrong, I can simply inspect between the derived columns and find the failing bit of logic. Many people mistakenly believe the amount of components on a data flow decrease the processing time. The reverse case can actually be true, see the sqlblog post at the end for more details.
Create a column called
IsValidLength
This is going to provide us a boolean to tell us whether we even need to try and parse the year, month and day from the string. I'm ensuring we don't have a NULL value and that the length is 6. Otherwise, we're going to flag as invalidI then am going to generate 3 more string columns (der_Year, der_Month, der_Day) with the parsed year, month and day. Otherwise, I'm going to use 00 as my value. Again, I only attempt to parse the string if it's valid.
Now comes the actual fun part, determining the actual year. Since your field is called date of birth, I'm assuming there are no future dated records - my apologies if you have psychic data. I took the route SQL Server, .NET library and COM use and have a prescribed year to use as the marker between what gets tagged as 19XX vs 20XX. I create an SSIS Variable called
TwoDigitYearCutoff
and assign it a value of 2014. In short, any years that are parsed from 00 to 14 will be tagged with the 21st century, while the rest are left in the 20th century. Adjust this value as needed.Back in my data flow, I create yet another column, this one der_ActualYear which will be the 4 digit year.
At this point, we know the full year, the month and day along with whether this data is any good. The final piece then is to put it all together and make it an actual Date. I build out a YYYY-MM-DD string and cast that as a date. Otherwise, I cast NULL as a date (to get my metadata consistent)
You can now wire der_DrvDOB into your target destination knowing that you have a valid date or a NULL.
Results
As expected, valid dates or null
Biml
Biml, the Business Intelligence Markup Language, describes the platform for business intelligence. Here, we're going to use it to describe the ETL. BIDS Helper, is a free add on for Visual Studio/BIDS/SSDT that addresses a host of shortcomings with it. Specifically, we're going to use the ability to transform a Biml file describing ETL into an SSIS package. This has the added benefit of provding you a mechanism for being able to generate exactly the solution I'm describing versus clicking through many tedious dialogue boxes.
References