I can watch the database size grow from 2GB to 12GB (max space
available) where it runs out of space and returns the following
message:
The issue is either you're simply running out of disk space and the physical file cannot grow to complete the transaction, or you have a limit on the SQL file growth settings and the physical file cannot grow more to complete the transaction (whatever max space available means extactly e.g. per settings or disk capacity).
I thought because I was in SIMPLE recovery model, bulk loading isn't
logged.
Per the below site and quoted text it does state minimal logging is more effecient than full logging, but it does NOT state that NO logging occurs at all in SIMPLE
recovery mode.
Prerequisites for Minimal Logging in Bulk
Import
under the simple recovery model or bulk-logged recovery model, minimal
logging of bulk-import operations reduces the possibility that a
bulk-import operation will fill the log space. Minimal logging is also
more efficient than full logging.
So this means there IS still SQL transaction logging that occurs when the database is in SIMPLE
recovery mode but it's minimal logging when compared to FULL
recovery mode.
Why is it growing so large?
It would seem the explanation would be that your your committed transaction sizes with the BULK INSERT
operations are HUGE. Simply make the commit\batch size parameters of your logic in SSIS or your execute TSQL in the package of a smaller size. Try testing with 100, 1000, 10000, 100000, and so on to see what gives you the best result and prevents the issue from occurring.
If the transactions are smaller, once the committed transactions are committed in SIMPLE
recovery mode, then the log space of the committed transactions can be reused by subsequent BULK INSERT
(or other) transactions.
Additional Resources
(These articles seem to have some references to potential parameters in SSIS package areas to set for smaller transactions, etc. The titles or the steps may be for a different ultimate goal, but there is potentially applicable content in these for what you may need to change in your SSIS logic to rectify your issue so these may be worth a simple read.)
After a long and beneficial discussion with Dan Guzman in this MSDN forums thread, the underlying scheduling scheme is finally clear to me. Here is a brief answer borrowing from the thread above:
So what's happening here is that the 2 SSMS connections both initially
use the same scheduler because it has the least load. The BULK INSERTs
then both run on the same scheduler, which is the reason for the high
SOS_SCHEDULER_YIELD waits.
The choice of which scheduler is used is not based on query cost. SQL
Server (actually SOS) basically tries to balance the workload among
schedulers by examining the load_factor column of sys.dm_os_schedulers.
When a new task is created for a query execution, SOS prefers to use the same
scheduler it used for the last request on the session. However, it
will use a different scheduler if the load factor of the preferred
scheduler is greater than a certain percentage of the other
scheduler's average load factor.
In depth discussion could be found in the link above.
Thanks all guys for your enthusiastic help!
Best Answer
Bulk inserts with small batch sizes can result in a significant increase in unused reserved space due to a full extent allocation for each batch for fast inserts per this Tiger Team post. I see from your comment that the batch size is only 200 rows and this explains the main reason for the large amount of unused space, although I would expect existing unallocated extents (not to be confused with unused space) to be reused before growing the file.
If you don't have control over the optimal batch size (e.g. third-party application), you can turn on trace flag 692 to disable fast inserts and reduce space needs but at the expense of performance.