I am a little out of my league with this scenario and hope this community can help get me on the right direction.
I have recently been tasked with re-designing and implementing a database application where the company I work for processes a large quantity of text files (sizes ranging from a 100MB to 3GB). We currently process approximately 600 of these files per year. This data needs to be accessible after our initial processing.
The current implementation has each file being imported via Application bulk load or SSIS to a "temp" database as individual tables. I would like to keep this process in place as it gives a good entry point for when manual entry into processing is needed.
I am in the process of trying to create a new MS SQL Server installation and architecture, and am looking for advice/articles which would cover a similar scenario as to best practices (partitioning, filegroups, etc.). I have worked in databases for a few years, but not at this level.
So if you have any experience in this type of scenario or know of a good article or book you could point me to, I would be extremely grateful. I just need a nudge in the right direction. Thank you in advance.
Best Answer
Below are general recommendations and your milage may vary depending on you workload running and your IO subsystem, server hardware configuration, etc.
At SSIS level, you can look into below things to speed up data read and data load :
Refer to Speeding Up SSIS Bulk Inserts into SQL Server for more details.
Now coming to SQL Server - optimizing data loading :
Look for optimizing network configuration as well :
Enable support for Jumbo frames. Use TCP
Chimney Offload. Use Receive Side Scaling (RSS).
Also, you should monitor Wait Statistics on the server especially SOS_SCHEDULER_YIELD resulting in scheduler contention on Servers having multiple CPUs running concurrent Bulk load operations and competing for the same CPU Cycles.
Refer to this excellent whitepaper The Data Loading Performance Guide which have everything that I mentioned with diagrams and examples.
Also, to automate the sliding window technique - creating staging tables, loading data into it and then switching the partitions, you can fully automate it using SQL Server Partition Management -- it has command line option as well -- available at CodePlex.