Sql-server – SSIS Script to split string into columns

parsescriptingsql-server-2008ssis

I have a dataset (log file) with a number of columns; one of them is "Other-Data" below (unordered string) and need to parse the string to create the derived columns according the u value (U1, U2, U3, etc…). The output columns should be something like:

U1   U2  U3                  U4   U5   etc.
null odw odw : CH : de : hom null null   
EUR  sss DE:de:hom           null null   
EUR  crm crm                 null null   

Other-Data:

u3=odw : CH : de : hom;u2=odw : Product : DSC-HX20V;~oref=http://www.bidl.ch/lang/de/product/dsc-h-series/dsc-hx20v
u1=EUR;u2=sss:Checkout-Step4:Orderacknowledgement;u3=DE:de:hom;u11=1;u12=302338533;u13=SVE1511C5E;u14=575.67;~oref=https://shop.bidl.de/shop/bibit/success.do
u15=1187;u13=SVE14A1C5E~VAIOEWY401;u11=1~1;u10=843.9~121.14;u9=1038~149;u3=crm : FI : fi : hom;u1=EUR;u2=crm : Checkout : Order acknowledgement;~oref=https://shop.bidl.fi/shop/bibit/success.do

Can anyone help with this?

Best Answer

Usually the way you'd go about this is to have a Script component and enter your VB / C# code to handle this type of input. If you have a variable number of columns than you should forget about SSIS being able to handle this since it doesn't work well with non-finite number of columns (being that tables have a finite number of columns on them).

Having said this, if you have a finite number of columns in the input file, than what you can do is specify the delimiter (the semicolon in the above case) and have the SSIS split the data into the columns and then do the ETL process on them, if needed.

If you have an upper limit of columns, but you get files with less columns than placing a script task before the DFT module to append the extra delimiters that are needed.

Since the first row looks like the header row (column names) you can set the flat file source to skip the first row.

Another alternative would be, if you have a small number of file formats, would be to create different names for each of the log files and have individual DFT modules handle each type of file.

Hope this helps.