Answer
Since you refer to the website use-the-index-luke.com
, consider the chapter:
Use The Index, Luke › The Where Clause › Searching For Ranges › Greater, Less and BETWEEN
It has an example that matches your situation perfectly (two-column index, one is tested for equality, the other for range), explains (with more of those nice index graphics) why @ypercube's advice is accurate and sums it up:
Rule of thumb: index for equality first — then for ranges.
Also good for just one column?
What to do for queries on just one column seems to be clear. More details and benchmarks concerning that under these related question:
Less selective column first?
Apart from that, what if you have only equality conditions for both columns?
It doesn't matter. Put the column first that is more likely to receive conditions of its own, which actually matters.
Consider this demo, or reproduce it yourself. I create a simple table of two columns with 100k rows. One with very few, the other one with lots of distinct values:
CREATE TEMP TABLE t AS
SELECT (random() * 10000)::int AS lots
, (random() * 4)::int AS few
FROM generate_series (1, 100000);
DELETE FROM t WHERE random() > 0.9; -- create some dead tuples, more "real-life"
ANALYZE t;
SELECT count(distinct lots) -- 9999
, count(distinct few) -- 5
FROM t;
Query:
SELECT *
FROM t
WHERE lots = 2345
AND few = 2;
EXPLAIN ANALYZE
output (Best of 10 to exclude caching effects):
Seq Scan on t (cost=0.00..5840.84 rows=2 width=8)
(actual time=5.646..15.535 rows=2 loops=1)
Filter: ((lots = 2345) AND (few = 2))
Buffers: local hit=443
Total runtime: 15.557 ms
Add index, retest:
CREATE INDEX t_lf_idx ON t(lots, few);
Index Scan using t_lf_idx on t (cost=0.00..3.76 rows=2 width=8)
(actual time=0.008..0.011 rows=2 loops=1)
Index Cond: ((lots = 2345) AND (few = 2))
Buffers: local hit=4
Total runtime: 0.027 ms
Add other index, retest:
DROP INDEX t_lf_idx;
CREATE INDEX t_fl_idx ON t(few, lots);
Index Scan using t_fl_idx on t (cost=0.00..3.74 rows=2 width=8)
(actual time=0.007..0.011 rows=2 loops=1)
Index Cond: ((few = 2) AND (lots = 2345))
Buffers: local hit=4
Total runtime: 0.027 ms
Best Answer
There a number of approaches to tuning XML queries in SQL Server. Property promotion is a good one, but I also regularly use the following:
XML Indexes
XML Indexes can transform XML query performance, but at a cost. Pre-SQL Server 2012, they come in two types, primary XML indexes and secondary XML indexes. You always need a primary XML index, and can optionally add PATH, PROPERTY or VALUE indexes which serve slightly different purposes. For your particular queries, a secondary PATH index gives step-change performance improvement in my simple rig below, eg:
Now to the cost. XML Indexes (prior to Selective XML Indexes) have a huge storage impact. I have seen tables grow up to 5x in size. In my test rig below, the table with 3 million rows and very simple XML goes from 0.7GB to 2GB with primary XML index, then 2.7GB with the PATH secondary index. Selective XML indexes in SQL Server 2012 onwards can improve on this massively.
Best practice syntax
I use CROSS APPLY when there are multiple levels of XML to drill down left to right. See the use of CROSS APPLY in my rig below. Also, avoid use of the parent axis (..) to drill back up. This can cause performance problems especially with larger pieces of XML as per here.
I also always use the text() accessor with untyped XML, eg
This is mentioned here and I've seen this technique give up to 15% performance improvement. YMMV. Moving the ordinal (
[1]
) to the end of the expression is more efficient and syntactically equivalent toParent[1]/FilePath[1]/SomeOtherElement[1]
.XML Schema Collection
These don't tend to bring performance improvement, but are a good practice, as like a constraint, they force the XML to have a certain structure.
Full-text Indexing
I occasionally combine Full-text indexing with XML with good results, eg here. It's probably not appropriate in this example as you don't seem to have any criteria.
Test Rig
In my simple test rig, I create a simple table with 3 million rows and a simple piece of XML in each row. I then try different combinations of syntax and XML indexes to see the difference:
My results:
So in summary, hopefully you can see, you can get step-change performance with your XML queries using the right features in combination, but with a hefty storage cost.
Recommended Reading
Performance Optimizations for the XML Data Type in SQL Server 2005
http://msdn.microsoft.com/en-us/library/ms345118.aspx
XML Indexes in SQL Server 2005
http://msdn.microsoft.com/en-us/library/ms345121(SQL.90).aspx