I wouldn't want to have 200 data flows in a single package. The time it'd take just to open up and validate would make you old before your time.
EzAPI is fun but if you're new to .NET and SSIS, oh hell no, you don't want that. I think you'll spend far more time learning about the SSIS object model and possibly dealing with COM than actually getting work done.
Since I'm lazy, I'll plug BIML as a free option you didn't list. From an answer on SO https://stackoverflow.com/questions/13809491/generating-several-similar-ssis-packages-file-data-source-to-db/13809604#13809604
- Biml is an interesting beast. Varigence will be happy to sell you a license to Mist but it's not needed. All you would need is BIDSHelper and then browse through BimlScript and look for a recipe that approximates your needs. Once you have that, click the context sensitive menu button in BIDSHelper and whoosh, it generates packages.
I think it might be an approach for you as well. You define your BIML that describes how your packages should behave and then generate them. In the scenario you describe where you make a change and have to fix N packages, nope, you fix your definition of the problem and regenerate packages.
Or if you've gained sufficient familiarity with the framework then use something like EzAPI to go and fix all the broken stuff. Heck, since you've tagged this as 2005, you could also give PacMan a try if you're in need of making mass modifications to existing packages.
SSIS Design considerations
Generally speaking, I try to make my packages focus on solving a single task (load sales data). If that requires 2 data flows, so be it. What I hate inheriting is a package from the import export wizard with many un-related data flows in a single package. Decompose them into something that solves a very specific problem. It makes future enhancements less risky as the surface area is reduced. An additional benefit is that I can be working on loading DimProducts
while my minion is dealing with loading SnowflakeFromHell
package.
Then use master package(s) to orchestrate the child work flows. I know you're on 2005 but SQL Server 2012's release of SSIS is the cat's pajamas. I love the project deployment model and the tight integration it allows between packages.
TSQL vs SSIS (my story)
As for the pure TSQL approach, in a previous job, they used a 73 step job for replicating all of their Informix data into SQL Server. It generally took about 9 hours but could stretch to 12 or so. After they bought a new SAN, it went down to about 7+ hours. Same logical process, rewritten in SSIS was a consistent sub 2 hours. Easily the biggest factor in driving down that time was the "free" parallelization we got using SSIS. The Agent job ran all of those tasks in serial. The master package basically divided the tables into processing units (5 parallel sets of serialized tasks of "run replicate table 1", table 2, etc) where I tried to divide the buckets into quasi equal sized units of work. This allowed the 60 or so lookup reference tables to get populated quickly and then the processing slowed down as it got into the "real" tables.
Other pluses for me using SSIS is that I get "free" configuration, logging and access to the .NET libraries for square data I need to bash into a round hole. I think it can be easier to maintain (pass off maintenance) an SSIS package than a pure TSQL approach by virtue of the graphical nature of the beast.
As always, your mileage may vary.
Best Answer
Given that:
Then I would just copy and paste the core dataflow and make per-table changes as appropriate. For a non-repeating process that is a small amount of tables (not 200 or more), you don't want to spend too much time creating a repeatable process that will never be repeated ;-). And you are going to have to manage the differences anyway, so you might as well just put your effort into that part.
And it should be possible in SSIS to define a reusable component (script task) to handle some of the "common" manipulations. If not, just create a stored procedure that accepts a table name and handle the common steps there. Then, in SSIS, just call that Stored Procedure for each table and that way you will have a single definition for the common stuff, and if you need to make a slight change in the common changes, then you will only need to make them in one spot :).
(please see UPDATE section below) Another option for handling data migration when you have issues of not pulling all fields over and possibly renaming fields is to use the
SqlBulkCopy
Class in .NET as it handles those things quite nicely. You can either create a C# Task in SSIS to use this class, or you can use a SQLCLR stored procedure such as DB_BulkCopy which is part of the SQL# SQLCLR library (which I am the author of, but this stored procedure is in the Free version). I describe this in a little more detail in the following answer:Import data from one Database to another script
UPDATE
The following requirement was added a day after I posted the suggestions above:
In light of this new requirement, I think that using the
SqlBulkCopy
alternate option is less attractive than sticking with SSIS in the manner that I described first. The only way to de-dupe the data usingSqlBulkCopy
/ DB_BulkCopy would be to run the process from the destination / target server, load the data into a temp table (it would have to be a global temporary table -- meaning, starting with##
) from the source server usingSqlBulkCopy
or DB_BulkCopy, and then insert into the real destination table from the global temporary table while filtering out the "duplicates" based on whatever rules you have to determine what a duplicate is.