Josh,
This is a very common task for all DBAs and the right answer is NOT the same for every one and for each server. As lot of other things, it depends on what you need.
Most definitely you don't want to run "Shrink Database" as already suggested. Its EVIL to performance and the below ref will show you why. It causes disk and as well as index fragmentation and this can lead to performance issues. You are better off by pre-allocationg a big size for the data and log files so that autogrowth will NOT kick-in.
I didn't understand your #2. selected tables full backup. Can you elaborate more on this?
Coming to Index reorganize, update statistics and index rebuilds, you need to be careful on how you do this otherwise you will end up using more resources and also end up with performance issues.
When you rebuild indexes the statistics of the indexes are updated with fullscan but if you do update statistics after that, then those will be updated again with a default sample (which depends on several factors, usually 5% of the table when table size > 8 MB) which may lead to performance issues. Depending on the edition you have, you may be able to do online index rebuilds. The right way of doing this activity is check the amount of fragmentation and depending on that either do index rebuild or index reorganize + update statistics. And also you may want to identify which tables need to update stats more frequently and try to update stats more often.
Maintenance Plans are OK but its hard to get the best out of them doing these customizations unless you can login to SSIS and tweak the MP's. that's why I prefer NOT to use them and use Ola Hallengren's free scripts that are more robust than MP's. Also, I would recommend to catch up on the referenced article by Paul Randal on this topic.
Ref: http://technet.microsoft.com/en-us/magazine/2008.08.database.aspx
This is NOT a comprehensive answer to your question but a good starting point. HTH and let us know if you have any additional questions/comments.
If you have 200 identical sources then you can parameterise a SSIS package with the data source and kick off nultiple threads. These can be controlled within the package by a foreach loop or from an external source that kicks off the extractors with a parameter.
You could consider a full load for relatively small dimensional sources and an incremental load for transactional data. This would require you to have persistent dimensions, but this is fairly straightforward to do with MERGE operations, or a pre-load area and dimension handler if you need slowly-changing dimensions.
You may wish to consider giving each source its own staging area (maybe a schema for each source in the staging database). This eliminates locking issues on the staging tables.
Build a set of views over the staging tables (essentially just set of unions that correspond to each of the source tables) that includes data source information. These can be generated fairly easily, so you don't have to manually cut and paste 200 different queries into the union. Once you've staged the data then ETL process can read the whole lot from the view.
This allows the ETL to run in one hit, although you will have to come up with a strategy to deal with extract failures from individual systems. For this, you might want to look into an architecture that deals with late arriving data gracefully, so you can catch up individual feeds that had transient issues.
BCP
For 200 simple extracts, BCP is probably a good way to go. The sources are all identical, so the BCP files will be the same across sources. You can build a load controller with SSIS. Getting multiple threads to read the top off a common list would require you to implement synchronised access to the list. The SSIS process has a bunch of loops running
in parallel in a sequence container that pop the next item, execute it and update the corresponding status.
Implementing the 'next' function uses a sproc running in a serializable transaction that pops the 'next' eligible source off the list and marks it as 'in progress' within the transaction. This is a 'table as queue' problem, but you don't have to implement synchronised inserts - a whole batch can be pushed into the table at the start of the run.
Structure the individual extract process so that it tries once or twice again if the first attempt fails. This will mitigate a lot of failures caused by transient errors. Fail the task if it fails twice, and structure the ETL so it is resilient to individual extraction failures.
Incremental loads
An incremental loader is probably not worth bothering for dimension tables unless you have a really big dimension that shows real performance issues. For the fact table data sources it probably is worth it. If you can add a row version to the application table with a timestamp column or some such, you can pick up stuff that's new. However, you will need to track this locally to record the last timestamp. If there is an insert or update date on the data you may be able to use that instead.
Full Loads
What could possibly go wrong?
200 processes kicking off to do a full load places a load spike on the network and possibly the staging database. This could lead to all sorts of transient issues like timeouts. For small dimension tables it's probably not such a big issue. However for 100GB there are quite a wide variety of issues - WAN saturation, locking (although the right staging architecture will mitigate that), availability of sources. The longer the extract process has to run the bigger influence environmental factors have on the reliability of the process.
There are quite a lot of imponderables here, so YMMV. I'd suggest an incremental load for
the larger tables if possible.
Best Answer
In our shop at my previous job, we had a secondary set of servers where we tested our restores. For our busiest customers we would restore tonight's backup, mark it as read_only, and their reporting tomorrow would connect to that copy of the database for all reports from yesterday back. This offloaded about 90% of the reporting workload and doubled as a backup/restore validation method. So if most reports don't need today's data, you could consider alleviating some of the production workload this way with some cheaper hardware - if you're not using Enterprise features you could even use Express for all the databases that are < 10GB. (Well, I see it's 2005, which had a lower DB size limitation, but you could always restore your copy forward into 2008/R2.) This would allow you to really distribute the databases to many low-end commodity servers (VMs or pizza boxes).