Sql-server – SQL Server data extract with limited performance impact

nolocksql serversql-server-2008

Background:

I have a SQL Server 2008 Enterprise Edition database containing three tables. The tables contain ~3 million, ~14 million, ~14.5 million rows respectively.

Problem:

I need to do a data extract from the database for reporting purposes. This will involved joining the three tables and writing each row to a text file. This database is currently taking production traffic and I want to limit performance impact. This data extract should not prevent reads, writes, or updates from occuring during the course of the process.

From the research I have done, I believe adding the 'NOLOCK' hint to the data extract query will allow the production traffic to behave normally while still accomplishing my primary task of the data extract. I'm assuming the dirty reads associated with the 'NOLOCK' hint will be isolated to the data extract query.

Am I correct to assume that the 'NOLOCK' hint on the data extract query will only affect that one particular query? Does it affect all queries being executed on the database while the data extract query is executing?

Is there a better approach to accomplishing my goal than what I've presented above?

Best Answer

Many ways you can approach this scenario.

I need to do a data extract from the database for reporting purposes.

  • Backup restore approach is best suited if you want to do this on an infrequent basis and do a static reporting of data.
  • Database Snapshot is an enterprise version feature only and this also gives you a static point-in-time view of the database.

Note that database snapshots take up space on disk, and having too many can fill up your disk space, especially in a production environment where data is updated frequently. In addition, using database snapshots will degrade performance a bit because I/O is increased on the database due to the fact that the data pages are copied as the write operation is performed.

Refer to Microsoft SQL Server Database Snapshots and Synonyms to over come some limitations that snapshots have and workarounds.

  • Look into Replication especially T-Rep if you want to do near to real-time reporting and want to offload the reporting work from the primary server. T-Rep will give you the benefits of Transactional consistency, Low latency, High Throughput, Minimal overhead as well as the ability to filter rows.
  • Lastly if none of the options are feasible to you, then with proper testing look into changing the isolation level of the database to RCSI (Read Committed Snapshot Isolation). This will be at the COST of TEMPDB performance.

    -- check if RCSI is enabled
    SELECT name, is_read_committed_snapshot_on
    FROM sys.databases
    WHERE name = '<dbname>'
    -- Enable RCSI
    ALTER DATABASE <dbname> SET READ_COMMITTED_SNAPSHOT ON
    

From The Data Loading Performance Guide :

RCSI basically will prevent queries reading data to block or be blocked by other queries modifying data in the same table. It is a powerful alternative to NOLOCK because it guarantees a complete, transactionally consistent view of the data and does not require a special hint. While RCSI originally targeted scenarios common to OLTP workloads, the feature can be a powerful tool in data warehouse workloads or scenarios involving large-scale bulk insert operations.

RCSI is enabled as a database-wide setting. When enabled, reader queries do not acquire shared locks on rows, pages or tables, and as a result they are not blocked by X or BU-locks taken by others. Instead, new or modified rows in a table carry a 17-byte version identifier, and the before-images of any rows being changed by a transaction (updated or deleted) are copied to tempdb using the row versioning mechanisms within SQL Server. Reader queries consider only those rows that were committed as of the start of the query – by ignoring any later version numbers and referencing tempdb for appropriate earlier versions of rows.

Also read up on : Comparing Different Results with RCSI & Read Committed from SQL CAT team.

LASTLY, to touch on NOLOCK hint, I would suggest you to read up