Sql-server – Inserting to a table when everything is parameterized

dynamic-sqletlsql serversql server 2014ssis

I need to grab data stored on a remote AS/400 file system to which I connect using a linked server connection, so I can send OPENQUERY select statements and receive the data on SQL Server.

I don't know how many files/tables I'm going to end up with yet, but at this point I know enough to see that I'm going to need a flexible and scalable solution…. that ideally performs well enough.

So I have created a table in my Metadata database that looks something like this:

Id    Filename      FilterColumn    FilterValue    LastTransferred
 1    STDLIB.FOOBAR      TRXDATE           NULL               NULL
 2    STDLIB.SOMETH         NULL           NULL               NULL
...

My control flow looks like this:

   Collect Metadata
         ↓
[ Foreach Loop Start ]

   Execute DeleteStmt
         ↓
   Execute OPENQUERY
         ↓
 Write to destination

[ Foreach Loop End   ]

In the Collect Metadata step I have a data flow task that reads from the above metadata table and a script component that uses it to generate T-SQL statements – one to truncate/delete the existing local tables, and another to select the remote data; then I send everything into a Recordset Destination.

So far, so good.

The Foreach Loop Container iterates the recordset and can execute the generated truncate/delete commands, then executes the OPENQUERY select command, and dumps the result set into a dedicated User::RemoteData object variable.

And this is where I'm stumped: in the "Write to destination" step, I have another script component (configured as a "source") that takes this variable and uses an OleDbDataAdapter to fill up a DataTable, from which I'm able to pick up the column names and actual row data (every iteration of the Foreach loop container has a different table in User::RemoteData).

So I started implementing the CreateNewOutputRows override like this:

public override void CreateNewOutputRows()
{
    using (var records = new DataTable())
    using (var adapter = new OleDbDataAdapter())
    {
        adapter.Fill(records, Variables.RemoteData);
        var builder = new StringBuilder();
        builder.Append("INSERT INTO dbo.");
        builder.Append(Variables.StagingTableName);

        var columnNames = new string[records.Columns.Count];
        for (var index = 0; index < records.Columns.Count; index++)
        {
            columnNames[index] = records.Columns[index].ColumnName;
        }

        builder.Append("(");
        builder.Append(string.Join(",", columnNames);
        builder.Append("););

        builder.Append(" VALUES (");
        foreach (DataColumn column in records.Columns)
        {
            // ok, stop, think. this can't be right.
        }
    }
}

So I stopped and started thinking… and I can't think of a nice, neat way to do this.

Am I going to have to script these INSERT commands and check column types to properly single-quote-delimit string values and whatnot?

Isn't this going to perform like a snail? I'm not liking that the Foreach container is going to process each table sequentially, and I can already see the above solution completing in hours… when it's only the "transfer data over from remote system" part (I have many more ETL steps coming up, that will need to consume this data).

What would be a better way? Do I have to drop the idea of sourcing my data flow from my Metadata database, and have n "statically-defined" workflows, one for each table?

Best Answer

// ok, stop, think. this can't be right.

Indeed, there's a better, cleaner, safer, faster way.

You have a DataTable in memory; you can "dump" it as-is to the destination table using a SqlBulkCopy object:

public override void CreateNewOutputRows()
{
    using (var records = new DataTable())
    using (var adapter = new OleDbDataAdapter())
    {
        adapter.Fill(records, Variables.RemoteData);

        var bulk = new SqlBulkCopy(_connection, SqlBulkCopyOptions.TableLock | SqlBulkCopyOptions.UseInternalTransaction, null);
        bulk.BulkCopyTimeout = 0;
        bulk.DestinationTableName = Variables.StagingTableName;
        bulk.WriteToServer(records);

        _transferredRows = records.Rows.Count;
        Output0Buffer.SetEndOfRowset();
    }
}

That fixes the first problem - you indeed don't need to be bothered with individual rows and/or column names and types, even less to build/concatenate an individual INSERT statement for every single row in the data source.


The other problem to solve is to get each iteration to run concurrently. I believe you could extract the body of the Foreach loop container into its own parameterized SSIS package, and replace it with an out-of-process Execute Package Task.