Sql-server – How to query large sets of data using server resources to the max

azureexportperformancesql server

I am planning of retrieving large amounts of data (100+ million rows) from SQL Server to Azure Data Lake Store and I need it to be done in shortest time possible. I would like to ensure that when I query the SQL Server database, it's utilizing the CPU, memory, and network resources available to the maximum.

I was wondering: How do I ensure that SQL Server gives my task its top priority and uses all the power the server has to transfer the data as fast it can?

  • Are there any hints I can give to the database to tell it that this job needs all of the server's resources?
  • Would it help if instead of a single SELECT, I issue several SELECT statements in parallel, each retrieving just a chunk of the data (e.g. first select retrieving rows 1 to 50000000 and second select retrieving rows 50000001 to 100000000)?
  • Anything else that can be done to ensure data is transferred as quickly as possible?

The server is in the cloud, so in theory I can size it so all resources are matched perfectly. I understand in reality something will become the bottleneck.

Best Answer

  1. there is no such hint, if your sessions are the only workload, server resources will be distributed equally among them

  2. It could definitely help. Perhaps 1 session per core. Make sure that clustered index built on ID, so data retrieval can be distributed by chunks using that column and query table with the most efficient operator for such scenario: clustered index seek (with range scan)

  3. Consider to increase TCP packet size: Performance Best Practice: Network Packet Size

  4. Antivirus/Network Security agents on a database server. Some companies have it installed by default. Such middle-ware can hit both IO and network layers, therefore consider to add appropriate exception rules or even disable them during the export.