SQL Server Indexing – Indexing a Subset of Rows

indexsql serversql-server-2008

Is it possible to create an index against a subset of rows fulfilling a certain condition?

If it is possible how does one ensure the index is used?

Best Answer

Yes, what you're talking about is a Filtered Index. It works just like you think, by creating an index off of a subset of data. Below is an excerpt from the above reference:

A filtered index is an optimized nonclustered index, especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table. A well-designed filtered index can improve query performance, reduce index maintenance costs, and reduce index storage costs compared with full-table indexes.

To your other question:

how does one ensure the index is used?

A few things will dictate this index being used:

A well-designed filtered index correlated with
A well-designed query

Obviously things like data distribution and many other factors play a role, but that goes with both filtered and non-filtered indexes.

Related Solutions

How to Create a Conditional Index in MySQL

MySQL currently doesn't support conditional indexes.

To achive what you are asking (not that you should do it ;) ) you can start creating an auxiliary table:

CREATE TABLE  `my_schema`.`auxiliary_table` (
   `id` int unsigned NOT NULL,
   `name` varchar(250), /* specify the same way as in your main table */
   PRIMARY KEY (`id`),
   KEY `name` (`name`)
);

Then you add three triggers in the main table:

delimiter //

CREATE TRIGGER example_insert AFTER INSERT ON main_table
FOR EACH ROW
BEGIN
   IF NEW.status = 'ACTIVE' THEN
      REPLACE auxiliary_table SET
         auxiliary_table.id = NEW.id,
         auxiliary_table.name = NEW.name;
   END IF;
END;//

CREATE TRIGGER example_update AFTER UPDATE ON main_table
FOR EACH ROW
BEGIN
   IF NEW.status = 'ACTIVE' THEN
      REPLACE auxiliary_table SET
         auxiliary_table.id = NEW.id,
         auxiliary_table.name = NEW.name;
   ELSE
      DELETE FROM auxiliary_table WHERE auxiliary_table.id = OLD.id;
   END IF;
END;//

CREATE TRIGGER example_delete AFTER DELETE ON main_table
FOR EACH ROW
BEGIN
   DELETE FROM auxiliary_table WHERE auxiliary_table.id = OLD.id;
END;//

delimiter ;

We need delimiter // because we want to use ; inside the triggers.

That way, the auxiliary table will contain exactly the IDs corresponding to the main table rows that contain the string "ACTIVE", being updated by the triggers.

To use that on a select, you can use the usual join:

SELECT main_table.* FROM auxiliary_table LEFT JOIN main_table
   ON auxiliary_table.id = main_table.id
   ORDER BY auxiliary_table.name;

If the main table already contains data, or in case you make some external operation that changes data in an unusual way (E.G.: outside MySQL), you can fix the auxiliary table with this:

INSERT INTO auxiliary_table SET
   id = main_table.id,
   name = main_table.name,
   WHERE main_table.status="ACTIVE";

About the performance, probably you'll have slower inserts, updates and deletes. This can make some sense only if you really deal with few cases where the condition desired is positive. Even that way, probably only testing you can see if the space saved really justifies this aproach (and if you are really saving any space at all).

SQL Server Index – How Index Works with Two Columns in WHERE Clause

I want to know how the query execution works here

The general execution model is a pipeline, where each iterator returns a row at a time. Execution starts at the root iterator (on the far left, labelled SELECT in your example).

After initialization, the root iterator requests a row from its immediate child, and so on down the chain until an iterator that can return a row is found. This passes back up the chain to the root where it is queued for despatch to the client. That is a very simplified overview, for more details see:

Showplan Logical and Physical Operators Reference (Product Documentation)
Iterators, Query Plans, and Why They Run Backwards (me)
A Second in the Life of a Query Operator by Brad Schulz

Is it going to fetch all the rows that matches the 'col2=val2' condition and check for the other condition?

The nonclustered index seek will locate a row that matches col2=val2. It will be able to return col2 and col1 (see its output list) because col1 is present in the index (since the primary key is clustered in this case).

This row is passed up to the nested loops join, which then passes control to the key lookup. The lookup uses the col1 value to seek into the clustered index b-tree to find the value of col3 in this row. The value is tested against the predicate col3=val3 and only returned if it matches.

If there is a match, the row (c1, c2, c3) is passed up the chain and queued for transmission to the client. As control descends the tree again, any new match for col2 in the nonclustered index will result in a repeat of the nested loops join -> lookup -> return row cycle. As soon as the nonclustered index seek runs out of rows, the process completes when control next returns to the root iterator.

Why these two -Index seek and Key-Lookup, are shown parallel in the execution plan?

That's just the way the graphical plan is laid out. See the links and discussion above for the correct way to understand the execution process in detail.

Will it always use the available index on 'col2', considering a large dataset and assuming almost all entries in 'col2' are unique?

Most likely yes. The optimizer makes a cost-based choice between the available strategies. With very few matches expected, the optimizer will usually assess a nonclustered index seek with a lookup as being the cheapest option. An exception occurs when the table is very small. In that case, scanning the clustered index and applying both predicates to each row will likely look cheapest.

As a final note, a covering nonclustered index that would avoid the lookup is:

CREATE [UNIQUE] NONCLUSTERED INDEX [dbo.table1 col2,col3 (col1)]
ON dbo.table1 (col2, col3)
INCLUDE (col1);

It should be specified UNIQUE if that is true for (col2, col3).

Best Answer

Related Solutions

How to Create a Conditional Index in MySQL

SQL Server Index – How Index Works with Two Columns in WHERE Clause

Related Question