Sql-server – Clustered Index improvemens for particular queries

sql-server-2005

I found a table that looks like this:

 CREATE TABLE [dbo].Table1 (  
  id INT primary key IDENTITY (1, 1),  
  [idUser] INT  NOT NULL ,  
  [Amount] INT NOT NULL ,  
  [Attempts] INT NOT NULL ,  
  [date] [datetime] NOT NULL ,  
  [SUM_Amount] INT NOT NULL   
 ) ON [PRIMARY]  

This table is created and populated with aggregated data for a particular period by a job.

Particularities:

  • This table will hold up to a million rows
  • idUser is unique
  • sum_Amount is running total of previous rows of amount.

This table will last as is, no update or delete or insert operation.
Just this type of queries:

 select top (@n) * from table1
 order by [SUM_Amount] desc, [Attempts] desc

 select top (@n) * from table1
 where [SUM_Amount] >=@m order by [SUM_Amount] asc 

I think it will improve preformance to change to a clustered index like this:

 CREATE TABLE [dbo].Table2 (  
  id INT IDENTITY (1, 1),  
  [idUser] INT  NOT NULL ,  
  [Amount] INT NOT NULL ,  
  [Attempts] INT NOT NULL ,  
  [date] [datetime] NOT NULL ,  
  [SUM_Amount] INT NOT NULL  

 CONSTRAINT [PK_Nueva]
   PRIMARY KEY CLUSTERED ([SUM_Amount] desc, [Attempts] desc, id asc)   

 ) ON [PRIMARY]  

I read that using a no unique clustered index will add a 4 bytes hidden column (http://msdn.microsoft.com/en-us/library/ms190639(v=sql.90).aspx), so I decide to add Identity to cluster index (not sure if it is the right approach)

I want to ask (at the risk of sound ridiculous, but need to be sure):

  • How could it be improved?
  • Will I have an impact on disk size?
  • Should I rebuild the index once all data have been inserted?

EDIT:

About id, I think is there just as a bad habit. I'll kept it, not sure how previous job calculate running total (I've no access to it)

There are a lot of tables like this, like hundreds for each day(don't ask me why). That is why DBA team ask me to not create a new index because of size issues. That is why I thinking of rearrange table structure via clustered index. Also changing data types which exceeds normal ranges.

Best Answer

Yes, placing a clustered index like you suggest will give you excellent query performance for those two particular types of queries, but it may spell disaster for most other SELECT queries on that table. I would offer an alternative solution:

I would suggest adding a basic clustered index on a single, unique column like idUser:

ALTER TABLE dbo.Table1 ADD
    PRIMARY KEY CLUSTERED (idUser)
        WITH (FILLFACTOR=100);

... and then building a separate, non-clustered indexe specifically for your queries:

CREATE INDEX IX_Table1_SUM_Amount
    ON dbo.Table1 (SUM_Amount DESC, Attempts DESC)
    WITH (FILLFACTOR=100);

The clustered index (the primary key) will not add any space to your table - it's just another way of organizing the storage by presorting the rows in a binary tree, which will make access to the table a lot quicker.

The non-clustered index will take up some space, but it's comparably small: Both SUM_Amount and Attempts are integer values, so they're 4 bytes large each, meaning that your non-clustered index should end up at roughly 12 MB with a million rows (including the "clustering key", which is also an integer).

Your two example queries will use the non-clustered index to find the largest/smallest values, and this will generate a Key lookup operator in the execution plan, but because you're using TOP (@n), I recon you won't even notice the performance cost of that. If you absolutely need even better performance and want to eliminate the Key lookup, you could INCLUDE () all the relevant columns in the non-clustered index to form a so-called covering index, effectively making the index a sorted copy of the original table:

CREATE INDEX IX_Table1_SUM_Amount
    ON dbo.Table1 (SUM_Amount DESC, Attempts DESC)
    INCLUDE (id, idUser, Amount, date)
    WITH (FILLFACTOR=100);

There is no need to rebuild an index if the data hasn't changed, particularly if the entire table is populated in a single batch. If the table has been populated over time, however, you may see some fragmentation.