Not sure that this is simpler than a cursor solution, but here is what I would do:
Before we begin we need a few tables in a SQL Fiddle:
MS SQL Server 2012 Schema Setup
CREATE TABLE dbo.tbl1(c1 INT, c2 INT, c3 INT);
CREATE TABLE dbo.tbl2(c4 INT, c5 INT, c6 INT);
CREATE TABLE dbo.tbl3(c7 INT, c8 INT, c9 INT);
First we need the list of all user tables. For that we can use the sys.tables
catalog view. The OBJECT_SCHEMA_NAME()
function gets us the schema name, the OBJECT_NAME()
function the table name. The QUOTENAME()
function quotes the names correctly, in case some of those contain special characters, key words or spaces. (In this example the use of OBJECT_NAME()
is not strictly necessary as sys.tables
has a name column, but I left it in as you can use this pattern with any catalog view that has an object_id
column.)
Query 1:
SELECT
QUOTENAME(OBJECT_SCHEMA_NAME(T.object_id))+'.'+
QUOTENAME(OBJECT_NAME(T.object_id)) AS quoted_table_name,
T.object_id
FROM sys.tables AS T;
Results:
| QUOTED_TABLE_NAME | OBJECT_ID |
|-------------------|-----------|
| [dbo].[tbl1] | 245575913 |
| [dbo].[tbl2] | 261575970 |
| [dbo].[tbl3] | 277576027 |
The next step is to get the list of column names, again quoted. We can use the sys.columns
catalog view for that.
Query 2:
SELECT C.name,C.column_id
FROM sys.columns AS C
WHERE C.object_id = OBJECT_ID('dbo.tbl1');
Results:
| NAME | COLUMN_ID |
|------|-----------|
| c1 | 1 |
| c2 | 2 |
| c3 | 3 |
The next hurdle is to get those columns in a comma separated list. There is no string concatenation aggregate function build in so we have to use a trick:
Query 3:
SELECT STUFF((
SELECT ','+QUOTENAME(name)
FROM sys.columns AS C
WHERE C.object_id = OBJECT_ID('dbo.tbl1')
ORDER BY C.column_id
FOR XML PATH(''),TYPE
).value('.','NVARCHAR(MAX)'),1,1,'') AS clomun_list;
Results:
| CLOMUN_LIST |
|----------------|
| [c1],[c2],[c3] |
With that all pieces are in place and we just have to put them all together:
Query 4:
SELECT 'SELECT ' +
CL.column_list +
' FROM ' +
QUOTENAME(OBJECT_SCHEMA_NAME(T.object_id)) + '.' +
QUOTENAME(OBJECT_NAME(T.object_id)) +
';' AS select_statement
FROM sys.tables AS T
CROSS APPLY (
SELECT STUFF((
SELECT ','+QUOTENAME(name)
FROM sys.columns AS C
WHERE C.object_id = T.object_id
ORDER BY C.column_id
FOR XML PATH(''),TYPE
).value('.','NVARCHAR(MAX)'),1,1,'') AS column_list
)CL;
Results:
| SELECT_STATEMENT |
|------------------------------------------|
| SELECT [c1],[c2],[c3] FROM [dbo].[tbl1]; |
| SELECT [c4],[c5],[c6] FROM [dbo].[tbl2]; |
| SELECT [c7],[c8],[c9] FROM [dbo].[tbl3]; |
This example will work in SQL 2005 and later, assuming you are on the latest service pack.
There is no clean solution to achieve this with SQL Server 2000. In that case you need to go back to your cursor solution.
And so enters the art of performance tuning and indexing strategies...
It seems logical to me to amend the existing index definition to include the suggested columns
I'm going to take your quote and write a third index definition:
create index [idx_index3]
on [table1] (col1, col2, col3)
include (col4, col5, col6....);
That should be the CREATE INDEX
statement that corresponds to your quoted statement.
That very well may be a prudent solution, but it depends. Here are a couple of examples when I say that it depends.
If you have a common workload that mostly consists of queries like this:
select col1, col2, col3
from table1
where col1 = 1
and col2 = 2
and col3 = 3;
Then your idx_index1
index would be solid. Perfectly narrow, it's an index that satisfies that query with no extraneous data in it (not taking into account the clustered index definition, if one at all).
But if you have workload that consists of queries mainly like the following:
select co11, col2, col3, col4, col5
from table1
where col1 = 1
and col2 = 2;
Then idx_index2
would be wise, as it is what's called a covering index preventing the need for a key lookup back to the clustered index (or a RID lookup back to the heap). That nonclustered index definition would solely encompass all of the data that query needs.
With your recommendation, it would be well suited for a query like the following:
select co11, col2, col3, col4, col5
from table1
where col1 = 1
and col2 = 2
and col3 = 3;
Your idx_index3
recommendation would be a covering index that satisfies the search criteria for the above query.
The point I'm trying to get at, is in an isolated question like this we can't answer this definitively. It all depends on what the common and frequent workload is. Of course you could always define all three of these indexes to handle each sample query type, but then comes into question the maintenance that'll be required to keep these indexes updated (think: INSERTs, UPDATEs, DELETEs). That's the overhead of indexes.
You need to dissect and evaluate the workload, and determine where the advantages will be best in place. If the first sample query is the most common by far being executed dozens of times a second, and there is a very infrequent query like the third sample query, then it wouldn't make sense to bloat the leaf level pages of the index with the INCLUDE
nonkey columns. It all depends on your workload.
If you understand prudent indexing strategies, and you understand your common workload, then by applying both of those you will be able to come up with what is the best route to take.
Best Answer
The included field order doesn't matter since there's no sorting involved - those fields are available at the leaf level of the index for every row.
There's not really a disadvantage to consolidating them like in your second example, except for page density on that index (more page reads per number of rows since there's more data stored for each row).
There ARE measurable advantages, though - for updates/inserts on
Col1
you only need to update one index instead of two, and you will save a lot of space by not keeping duplicate key entries forcol1
in two separate indexes.The only scenario I can think of that would make sense to have the two narrower indexes is if one of those was designed specifically for a mission critical procedure or view. If you have an index that includes only what you need for a certain query that gets run with high frequency it would make sense to keep it as streamlined as possible.