I'm the process of planning the live migration of a 2TB database to partitioned tables. The system is broadly speaking a document store, the majority of the space being allocated to LOBs of between 50kb and 500kb, with a small percentage in the 500kb to 1MB range. Part of the migration will involve BCPing data from the old to new database.
BCP is the preferred approach as the current/historic divide in the data permits extracting the older data in stages (during quieter periods) in advance of a final switch, minimising impact on the live system. The volume of data and availability of storage precludes an in-situ rebuild to a partition scheme.
I'm suspect there maybe some performance gains to be had by experimenting with KILOBYTES_PER_BATCH rather than ROWS_PER_BATCH, due to the BLOB content. It's suggested in the BCP documentation that SQL can optimise the operations based on this value.
What I can't find is any guidance on the nature of these optimisations or where to start my testing. In the abscence of suggestions I'll try short runs at 4/8/16/32/64mb boundaries to start.
Probably some gains to be from changing the packet size (BCP -a parameter, rather than server level setting) but I'm inclined to bump this to the maximum 65535 unless anyone has a more formulaic approach.
Best Answer
This is not a direct answer to your question, but there are some articles that you would benefit from reading them (in case you didn't find them first :-) ). They are about loading lots of data using bcp/bulk copy. I've read them all and I didn't find anything detailed regarding KILOBYTES_PER_BATCH, they're all using ROWS_PER_BATCH, but I'm sure you will find other useful information.
Load 1TB in less than 1 hour (from SQL CAT team) - list of advices from here (quote):
Run as many load processes as you have available CPUs. If you have 32 CPUs, run 32 parallel loads. If you have 8 CPUs, run 8 parallel loads.
If you have control over the creation of your input files, make them of a size that is evenly divisible by the number of load threads you want to run in parallel. Also make sure all records belong to one partition if you want to use the switch partition strategy.
Use BULK insert instead of BCP if you are running the process on the SQL Server machine.
Use table partitioning to gain another 8-10%, but only if your input files are GUARANTEED to match your partitioning function, meaning that all records in one file must be in the same partition.
Use TABLOCK to avoid row at a time locking.
Use ROWS PER BATCH = 2500, or something near this if you are importing multiple streams into one table.
Top 10 Best Practices for Building a Large Scale Relational Data Warehouse (from SQL CAT team) - advices (quote):
Use SIMPLE or BULK LOGGED recovery model during the initial data load.
Create the partitioned fact table with the Clustered index.
Create non-indexed staging tables for each partition, and separate source data files for populating each partition.
Populate the staging tables in parallel (Use multiple BULK INSERT, BCP or SSIS tasks)
Build a clustered index on each staging table, then create appropriate CHECK constraints.
SWITCH all partitions into the partitioned table.
Build non-clustered indexes on the partitioned table.
The Data Loading Performance Guide (from SQL CAT team)
Loading Bulk Data into a Partitioned Table - SQL Server Best Practices Article (Technet article)
SQL Server 2000 Incremental Bulk Load Case Study (Technet article)
Lessons Learned and Findings from a Large Fast-Track POC (from SQL CAT team)
Performance Tuning Tips for SQL Server BCP (By Brad McGehee)
Performance Impact: Finding the Most Optimal Batch Size (By Linchi Shea)
and the obvious MSDN references:
In my personal experience, I succeeded to make a fast data load using parallel load and testing with several batch sizes. I guess that only personal testing will suit you. Hopefully you will find some good advices in the references.