I wouldn't want to have 200 data flows in a single package. The time it'd take just to open up and validate would make you old before your time.
EzAPI is fun but if you're new to .NET and SSIS, oh hell no, you don't want that. I think you'll spend far more time learning about the SSIS object model and possibly dealing with COM than actually getting work done.
Since I'm lazy, I'll plug BIML as a free option you didn't list. From an answer on SO https://stackoverflow.com/questions/13809491/generating-several-similar-ssis-packages-file-data-source-to-db/13809604#13809604
- Biml is an interesting beast. Varigence will be happy to sell you a license to Mist but it's not needed. All you would need is BIDSHelper and then browse through BimlScript and look for a recipe that approximates your needs. Once you have that, click the context sensitive menu button in BIDSHelper and whoosh, it generates packages.
I think it might be an approach for you as well. You define your BIML that describes how your packages should behave and then generate them. In the scenario you describe where you make a change and have to fix N packages, nope, you fix your definition of the problem and regenerate packages.
Or if you've gained sufficient familiarity with the framework then use something like EzAPI to go and fix all the broken stuff. Heck, since you've tagged this as 2005, you could also give PacMan a try if you're in need of making mass modifications to existing packages.
SSIS Design considerations
Generally speaking, I try to make my packages focus on solving a single task (load sales data). If that requires 2 data flows, so be it. What I hate inheriting is a package from the import export wizard with many un-related data flows in a single package. Decompose them into something that solves a very specific problem. It makes future enhancements less risky as the surface area is reduced. An additional benefit is that I can be working on loading DimProducts
while my minion is dealing with loading SnowflakeFromHell
package.
Then use master package(s) to orchestrate the child work flows. I know you're on 2005 but SQL Server 2012's release of SSIS is the cat's pajamas. I love the project deployment model and the tight integration it allows between packages.
TSQL vs SSIS (my story)
As for the pure TSQL approach, in a previous job, they used a 73 step job for replicating all of their Informix data into SQL Server. It generally took about 9 hours but could stretch to 12 or so. After they bought a new SAN, it went down to about 7+ hours. Same logical process, rewritten in SSIS was a consistent sub 2 hours. Easily the biggest factor in driving down that time was the "free" parallelization we got using SSIS. The Agent job ran all of those tasks in serial. The master package basically divided the tables into processing units (5 parallel sets of serialized tasks of "run replicate table 1", table 2, etc) where I tried to divide the buckets into quasi equal sized units of work. This allowed the 60 or so lookup reference tables to get populated quickly and then the processing slowed down as it got into the "real" tables.
Other pluses for me using SSIS is that I get "free" configuration, logging and access to the .NET libraries for square data I need to bash into a round hole. I think it can be easier to maintain (pass off maintenance) an SSIS package than a pure TSQL approach by virtue of the graphical nature of the beast.
As always, your mileage may vary.
For 8 million rows, I don't think there will be a lot of difference between SSIS and the SQLBulkCopy class. In SSIS, you do have to make sure you are using the fast load option in the OLE DB Destination, otherwise it will surely be slower.
If the source is local, you could try a SQL Server Destination, it might be a tad quicker than the OLE DB Destination.
Best Answer
The OLE DB Destination Component's Data Access Modes comes in two flavors - fast and non-fast.
Fast, either "table or view - fast load" or "table or view name variable - fast load" means that data will be loaded in a set-based fashion.
Slow - either the "table or view" or "table or view name variable" will result in SSIS issuing singleton insert statements to the database. If you're loading 10, 100, maybe even 10000 rows, there's probably little appreciable performance difference between the two methods. However, at some point you're going to saturate your SQL Server instance with all these piddly little requests. Additionally, you're going to abuse the heck out of your transaction log.
Why would you ever want the non-fast methods? Bad data. If I sent in 10000 rows of data and the 9999th row had a date of 2015-02-29, you would have 10k atomic inserts and commits/rollbacks. If I was using the Fast method, that entire batch of 10k rows will either all save or none of them. And if you want to know which row(s) errored out, the lowest level of granularity you will have is 10k rows.
Now, there are approaches to getting as much data loaded as fast as possible and still handle dirty data. It's a cascading failure approach and it looks something like
The idea is that you find the right size to insert as much as possible in one shot but if you get bad data, you're going to try resaving the data in successively smaller batches to get to the bad rows. Here I started with a Maximum insert commit size (FastLoadMaxInsertCommit) of 10000. On the Error Row disposition, I change it to
Redirect Row
fromFail Component
.The next destination is the same as above but here I attempt a fast load and save it in batches of 100 rows. Again, test or make some pretense of coming up with a reasonable size. This will result in 100 batches of 100 rows sent because we know somewhere in there, there is at least one row that violated the integrity constraints for the table.
I then add a third component to the mix, this time I save in batches of 1. Or you can just change the table access mode away from the Fast Load version because it'll yield the same result. We will save each row individually and that will enable us to do "something" with the single bad row(s).
Finally, I have a failsafe destination. Maybe it's the "same" table as the intended destination but all the columns are declared as
nvarchar(4000) NULL
. Whatever ends up at that table needs to be researched and cleaned/discarded or whatever your bad data resolution process is. Others dump to a flat file but really, whatever makes sense for how you want to track bad data works.