It is best to use any combination of the techniques, but match them to your actual queries. Some queries may only need a single column index, while other queries may need a mutli-column index. That is a great thing about indexes, is you can add them as you need them, and remove ones that no longer serve a purpose, so you have exactly what you need for your queries.
Answer
Since you refer to the website use-the-index-luke.com
, consider the chapter:
Use The Index, Luke › The Where Clause › Searching For Ranges › Greater, Less and BETWEEN
It has an example that matches your situation perfectly (two-column index, one is tested for equality, the other for range), explains (with more of those nice index graphics) why @ypercube's advice is accurate and sums it up:
Rule of thumb: index for equality first — then for ranges.
Also good for just one column?
What to do for queries on just one column seems to be clear. More details and benchmarks concerning that under these related question:
Less selective column first?
Apart from that, what if you have only equality conditions for both columns?
It doesn't matter. Put the column first that is more likely to receive conditions of its own, which actually matters.
Consider this demo, or reproduce it yourself. I create a simple table of two columns with 100k rows. One with very few, the other one with lots of distinct values:
CREATE TEMP TABLE t AS
SELECT (random() * 10000)::int AS lots
, (random() * 4)::int AS few
FROM generate_series (1, 100000);
DELETE FROM t WHERE random() > 0.9; -- create some dead tuples, more "real-life"
ANALYZE t;
SELECT count(distinct lots) -- 9999
, count(distinct few) -- 5
FROM t;
Query:
SELECT *
FROM t
WHERE lots = 2345
AND few = 2;
EXPLAIN ANALYZE
output (Best of 10 to exclude caching effects):
Seq Scan on t (cost=0.00..5840.84 rows=2 width=8)
(actual time=5.646..15.535 rows=2 loops=1)
Filter: ((lots = 2345) AND (few = 2))
Buffers: local hit=443
Total runtime: 15.557 ms
Add index, retest:
CREATE INDEX t_lf_idx ON t(lots, few);
Index Scan using t_lf_idx on t (cost=0.00..3.76 rows=2 width=8)
(actual time=0.008..0.011 rows=2 loops=1)
Index Cond: ((lots = 2345) AND (few = 2))
Buffers: local hit=4
Total runtime: 0.027 ms
Add other index, retest:
DROP INDEX t_lf_idx;
CREATE INDEX t_fl_idx ON t(few, lots);
Index Scan using t_fl_idx on t (cost=0.00..3.74 rows=2 width=8)
(actual time=0.007..0.011 rows=2 loops=1)
Index Cond: ((few = 2) AND (lots = 2345))
Buffers: local hit=4
Total runtime: 0.027 ms
Best Answer
If the "cube" needs to process all rows, indexes won't speed up the process as the querying engine would just do a table scan.
I think the best advice I could give is to familiarize yourself with query plan output so that you can see how the database is satisfying the query requests. This will allow you to see what access methods are used to retrieve data. It would also allow you to answer you're own question: if you create the proposed index, does sql server use it to fulfill your query?
I think the answer to your question is going to be "No". If your "cube" is being materialized by executing "SELECT DISTINCT columnB FROM table", and you currently have an indexes on "columnA" and "columnB". "Consolidating" those separate indexes into a single index on "(columnA, columnB)" is unlikely to help your "cube" queries on "columnB". Your mileage may vary, not having access to the schema nor query source makes these kinds of questions difficult to answer definitively.
Having said that, I think the correct response is to point you towards the SQL Server data warehousing functionality instead of rolling your own solution.