SSIS import with header record

cssis

I have fixed length text files that I have to import into SQL Server daily. These files contain header records (01) and Detail records (02). I am trying to build an SSIS package to do this. I have a script component that splits the data out but I cant get it to work. I am new to SSIS …please help.

Here is sample data:
01CA0022-0002 I0000000002618
02 AJ -000000000000375626 -0000000000003
02 AJ -000000000000372226 -00000000000037
02 AJ -000000000000738726 -00000000000073

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void PreExecute()
    {
        base.PreExecute();
    }

    public override void PostExecute()
    {
        base.PostExecute();
    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        if (Row.Line.Substring(0, 2) == "01")
        {
            HeaderBuffer.AddRow();
            HeaderBuffer.NAME_EDC = Row.Line.Substring(231, 60);
            HeaderBuffer.DUNS_EDC = Row.Line.Substring(291, 13);
        }
        else if (Row.Line.Substring(0, 2) == "02")
        {
            DetailBuffer.AddRow();
            DetailBuffer.REC_TYPE_DTEAIL = Row.Line.Substring(1, 2);
            DetailBuffer.PMT_NUMBER = Row.Line.Substring(3, 6);
            DetailBuffer.PMT_ACTION_CD = Row.Line.Substring(9, 2);
            DetailBuffer.PMT_AMOUNT = Row.Line.Substring(11, 18);
            DetailBuffer.ADJUSTMENT_CD = Row.Line.Substring(29, 2);
            DetailBuffer.ADJ_AMOUNT = Row.Line.Substring(31, 18);
            DetailBuffer.ACCT_EDC = Row.Line.Substring(49, 50);
            DetailBuffer.ACCT_OLD_EDC = Row.Line.Substring(99, 50);
            DetailBuffer.NO_ACCT_EGS = Row.Line.Substring(149, 30);
            DetailBuffer.PMT_POST_DATE = Row.Line.Substring(179, 8);
            DetailBuffer.XREF_NUMBER = Row.Line.Substring(187, 30);
            DetailBuffer.UNDISCOUNTED_AMOUNT = Row.Line.Substring(217, 18);
            DetailBuffer.DISCOUNT_ADJUSTMENT_AMOUNT = Row.Line.Substring(235, 18);
            DetailBuffer.INVOICE_ADVICE_NUMBER = Row.Line.Substring(253, 30);
        }
    }

    public override void CreateNewOutputRows()
    {
    }

}

The error I get says:

Error at Data Flow Task [Script Component [85]]: The binary code for the script is not found. Please open the script in the designer by clicking Edit Script button and make sure it builds successfully. Error at Data Flow Task [SSIS.Pipeline]: "Script Component" failed validation and returned validation status "VS_ISBROKEN". Error at Data Flow Task [SSIS.Pipeline]: One or more component failed validation. Error at Data Flow Task: There were errors during task validation

Best Answer

Instead of using a script component, I recommend using a more database-centric approach.

Consider the following approach:

  1. Read the data from the flat file using a flat file source. Keep your flat file source simple. It should have one column which contains the EntireRow.
  2. (optional) add a script component or derived column to count rows and generate a InputRowNumber field. The InputRowNumber is helfpul for debugging.
  3. Write the two columns to a simple staging table called InputFlatFile containing the InputRowNumber int, and EntireRow varchar(max) fields.
  4. Using ExecuteSQL Tasks, execute queries which insert the data from your InputFlatFile staging table into your real database. Based on your example data, these queries should be fairly straight forward.

INSERT INTO HeaderTable (NAME_EDC, DUNS_EDC)
SELECT SUBSTRING(EntireRow, 231, 60) NAME_EDC
     , SUBSTRING(EntireRow, 291, 13) DUNS_EDC
FROM InputFlatFile
WHERE LEFT(EntireRow,2) = '01'

INSERT INTO DetailTable (columns..)
SELECT SUBSTRING(EntireRow, 1, 2) REC_TYPE_DETAIL
     , SUBSTRING(EntireRow, 3, 6) PMT_NUMBER
     (... more columns)
FROM InputFlatFile
WHERE LEFT(EntireRow,2) = '02'

  1. use an ExecuteSQL task to permanently delete the data in the staging table so it is ready for the next load, or use an OUTPUT clause to move it into an archive table if necessary.

I think the database-centric approach is better for the following reasons:

  1. From this base, you can easily create a queryable data lineage. Want to know which file created which row in the final database? Create a FileLog table and insert a row at the start of the control flow with logging information like file name, modified date, etc. Retain a reference to the InputRowNumber and FileLogID in the destination database and you can trace the data from source file to destination easily.
  2. In my experience, people who work with SSIS tend to have a stronger database background than OO programming background so the Set Theory that I used above comes naturally and is easier to maintain on an ongoing basis (also, there tends to be more helpful ideas for database-centric approaches on this and other forums)