Not sure that this is simpler than a cursor solution, but here is what I would do:
Before we begin we need a few tables in a SQL Fiddle:
MS SQL Server 2012 Schema Setup
CREATE TABLE dbo.tbl1(c1 INT, c2 INT, c3 INT);
CREATE TABLE dbo.tbl2(c4 INT, c5 INT, c6 INT);
CREATE TABLE dbo.tbl3(c7 INT, c8 INT, c9 INT);
First we need the list of all user tables. For that we can use the sys.tables
catalog view. The OBJECT_SCHEMA_NAME()
function gets us the schema name, the OBJECT_NAME()
function the table name. The QUOTENAME()
function quotes the names correctly, in case some of those contain special characters, key words or spaces. (In this example the use of OBJECT_NAME()
is not strictly necessary as sys.tables
has a name column, but I left it in as you can use this pattern with any catalog view that has an object_id
column.)
Query 1:
SELECT
QUOTENAME(OBJECT_SCHEMA_NAME(T.object_id))+'.'+
QUOTENAME(OBJECT_NAME(T.object_id)) AS quoted_table_name,
T.object_id
FROM sys.tables AS T;
Results:
| QUOTED_TABLE_NAME | OBJECT_ID |
|-------------------|-----------|
| [dbo].[tbl1] | 245575913 |
| [dbo].[tbl2] | 261575970 |
| [dbo].[tbl3] | 277576027 |
The next step is to get the list of column names, again quoted. We can use the sys.columns
catalog view for that.
Query 2:
SELECT C.name,C.column_id
FROM sys.columns AS C
WHERE C.object_id = OBJECT_ID('dbo.tbl1');
Results:
| NAME | COLUMN_ID |
|------|-----------|
| c1 | 1 |
| c2 | 2 |
| c3 | 3 |
The next hurdle is to get those columns in a comma separated list. There is no string concatenation aggregate function build in so we have to use a trick:
Query 3:
SELECT STUFF((
SELECT ','+QUOTENAME(name)
FROM sys.columns AS C
WHERE C.object_id = OBJECT_ID('dbo.tbl1')
ORDER BY C.column_id
FOR XML PATH(''),TYPE
).value('.','NVARCHAR(MAX)'),1,1,'') AS clomun_list;
Results:
| CLOMUN_LIST |
|----------------|
| [c1],[c2],[c3] |
With that all pieces are in place and we just have to put them all together:
Query 4:
SELECT 'SELECT ' +
CL.column_list +
' FROM ' +
QUOTENAME(OBJECT_SCHEMA_NAME(T.object_id)) + '.' +
QUOTENAME(OBJECT_NAME(T.object_id)) +
';' AS select_statement
FROM sys.tables AS T
CROSS APPLY (
SELECT STUFF((
SELECT ','+QUOTENAME(name)
FROM sys.columns AS C
WHERE C.object_id = T.object_id
ORDER BY C.column_id
FOR XML PATH(''),TYPE
).value('.','NVARCHAR(MAX)'),1,1,'') AS column_list
)CL;
Results:
| SELECT_STATEMENT |
|------------------------------------------|
| SELECT [c1],[c2],[c3] FROM [dbo].[tbl1]; |
| SELECT [c4],[c5],[c6] FROM [dbo].[tbl2]; |
| SELECT [c7],[c8],[c9] FROM [dbo].[tbl3]; |
This example will work in SQL 2005 and later, assuming you are on the latest service pack.
There is no clean solution to achieve this with SQL Server 2000. In that case you need to go back to your cursor solution.
The percentage costs on an execution plan are from the optimizer estimates, even when an actual execution plan produced. The actual execution plan does use the exact plan and include both the estimated rows and the actual row counts. Discrepancies between the row counts can be useful to determine how accurate the estimate was.
Somewhere between the subquery and comparing it to the column in derived
, the optimizer wasn't able to correctly estimate how many rows would match. It guessed that there would be 18 rows from derived
when there were actually over 220,000. An additional clue is the warning message Cardnality Estimate: CONVERT(nvarchar(35),[mssqlsystemresource].[sys].[spt_values].[name],0)
on the SELECT
node.
If you were to check the query run length with something else, such as STATISTICS TIME
, I would expect them to be much closer, and likely the second query running faster.
Here's another plan analysis with a somewhat similar situation. (SQL Server Plan Explorer) (with hat tip to Kendra Little on how to fool the optimizer)
The estimate shows a 93%/7% cost split, but by looking at the actual CPU, time, or IO, the difference is not that extreme. IO is about 75%/25% and CPU is roughly 60%/40%. (I tried to come up with something more even, but wasn't able to.)
Best Answer
There's not necessarily any downsides to your approach, it's completely dependent on how frequently you're writing to the table vs updating and reading from it, which is why SQL Server let's the developer choose what and when to index things instead of trying to guess what they want. (It doesn't know your future intentions with a particular table any better than you do.)
I'm sure you understand how indexing works so I won't go into too much detail, but generally speaking, indexing stores the data sorted in a B-Tree data structure. So it's very efficient when having to look up a specific set of data that is covered by that index.
Because of how B-Trees work and the algorithm used to building one with the indexed data, it's generally fastest to have all the data upfront and then index it into a B-Tree. When you already have an index (B-Tree) in place and new data is added or deleted, then that can cause additional "shuffling" events to reorganize the B-Tree which can be less efficient. (The "shuffling" that occurs makes me think of MergeSort algorithm by the way. That helps me visualize the difference of having all the data up front vs adding new data after it's been sorted already.)
Obviously usually it's not the case that
Tables
generally stay the same size with the exact same records, which is why more times than not it's recommended to create your indexes on theTable
upfront and SQL Server does it's best to efficiently update the underlying B-Tree as changes occur in theTable
(and it does a great job at it up to a certain point).In certain cases though, yours maybe being one, if the
Table
has a very high frequency ofINSERTs
andDELETEs
and low frequency ofUPDATEs
andSELECTs
against it, then creating and dropping the indexes on thatTable
ad-hoc (just before and after reading from the Table) might make sense.At the end of the day you'd have to test both ways to see what works best for your environment. The size of the
Table
doesn't matter so much when it comes to choosing when to index it, as it makes no difference to SQL Server if you want to store a billion records in a single index or ten records, but moreso it depends how frequently you're inserting and deleting from thatTable
vs updating and selecting from it. (E.g. if you insert 100,000 records every minute into thatTable
but onlySELECT
from it once a day then it may be better to create your index ad-hoc.)