Sql-server – SQL Server 2005/8 Query Optimisation Hints

sql serversql-server-2005

I'm looking at educating a team on writing better SQL Server queries and was wondering what people's best hints were for improving performance.

For instance I once had a DBA who insisted that count(*) would perform worse than count(1) (I have no idea whether she was right or whether it's still valid against the latest query optimisers).

What simple things should I be telling the team to try and always use or avoid? I'm ideally looking for things which (a) might make a reasonable difference and (b) are straight forward, 1 – 2 lines to state.

Best Answer

Query tuning 101

There is no magic silver bullet to query tuning, although I can give you some hints and tips. The first thing to do is to understand what's actually going on behind the scenes. Get a good internals book like the third Guru's Guide book.

Poorly performing queries tend to come in two basic flavours: Transactional queries that take too long, and grinding batch jobs (or reports) that take too long. One good sign of a query with something wrong with it is a single item in the query plan taking 99% of the time.

Transactional queries

On most occasions a poorly performing transactional query is one of a few things:

  • A missing index. You can see this in the query plan - table scans of large tables on a join that should be very selective (i.e. return few rows).

  • Query unable to use an index. If you have OR conditions in the where clause, joins on a calculated value or some other item in the query that is not sarg-able then you may need to re-write the query. Briefly, sargs are query predicates that can use indexes to eliminate rows. Logical AND, equality and inequality (>, >=, <, <= and !=) are all sarg-able. OR is traditionally not sarg-able. However, you can often translate OR's into sarg-able predicates by inverting the sense from OR to NOT (foo and not bar) type constructs.

  • Inefficient predicates. For example, if you have a where in referencing a nested subquery see if it can be re-written as where exists or as a join. This may result in more efficient query plans and here are other standard re-writes you can try also. Again, the Guru's guide books and others on the subject are a good starting point.

Batch queries

Batch queries are more complicated and have different tuning issues. Some tips are:

  • Indexing. This can make a large difference for the same reason it does with transactional queries. Often a good sign of a missing index is a long, grinding operation (99% of the query plan) that doesn't seem to be thrashing the machine.

  • Temporary tables. You may find it better to break down a query into several queries populating temporary tables. Larger queries give the optimiser more room to screw up, although this is less of an issue that it used to be. Make the temp tables with select into as this operation is minimally logged (much less log activity), which reduces the I/O load.

    Note that temporary tables in tempdb are the same data structure that the optimiser uses to store intermediate join results, so there is no performance penalty for doing this. You can also create an index (including clustered and covering indexes) on a temp table, which may improve performance of the queries reading it for the same reasons that they improve queries on static tables.

    Don't overdo temp tables though, as they can make things harder to trace back through the query. For smaller tables within a stored procedure, test to see if table variables help. These are an in-memory data structure, so they can be a performance win.

  • Clustered and covering indexes. These can improve the performance of a query as they force locality of reference on the disk based on some grouping column. A clustered index can make a large difference to the performance of a batch job.

  • Inefficient predicates. These can cause problems with sargs and other sub-optimisation isses in much the same way as they do with transactional queries.

  • Table scan is your friend. Contrary to popular belief, table scans are not inherently evil. Generally they are a sign of something wrong in a transactional query, but they are often the most efficient way to do a large batch operation. If you are doing something with more than a few percent of rows in a table, a table scan is often the most efficient way to cover the table.

  • Nested loops joins. Take a look at what the optimiser is doing on both sides of the join. These can be inefficient if you are (for example table scanning two large tables on both sides of a nested loops join. Consider using clustered indexes or order by and trying to change the operation to a merge join or hinting to promote a hash join if one side is small enough to do this with.

Locking

Locking can also cause performance issues. If your system is performing badly under load, look at the profiler and perfmon counters related to locks and check whether there is any significant contention. sp_who2 has a 'BlkBy' column in the result set that will show if a query is blocked and what is blocking it. Also, profiles with 'deadlock graph' events (if you have queries deadlocking) and lock related events can be useful to troubleshoot lock issues.