I have table with Orders with unique incrementing ID and (order creation) Date. This table is quite large and wide (5mn rows, 50 columns, 10 columns of which are FK IDs). Example:
CREATE TABLE Orders As (
ID int UNIQUE AUTOINCREMENT (1,1),
DATE datetime, --This is SQL Server 2005, so no [date] data type
OrderStatus char (2),
ClientID int
...
)
I get this table as copy of yesterday's production table. Table (as whole DB) is used for reporting purposes and therefore read-only.
Part 1
I have 2 very common use cases:
- 80% (or more) of queries have
DATE
column inWHERE
clause, as users want data for specific business date.
Here I want to create clustered index on DATE.
- 40%-60% of queries use
OrderID
toJOIN
Orders table to other tables with information about Order details (Product, Supplpier, Payments, Reservations, etc).
Here I want to create clustered index on OrderID.
Part 2
Can I have one CI for both cases?
Index (ID, DATE) will not work with WHERE
clause.
Index (DATE,ID) will not work for joins on ID
only or ID
in WHERE
Catch. We know that DATE is incremental as well as ID. ID with higher value cannot appear yesterday under any circumstances as it is AUTOINCREMENT
.
Question. Is there a way to tell SQL Server that CI (DATE, ID) will have ID sequentially ordered for all dates?
My only solution at the moment is crating non clustered covering index (ID, DATE), but it's suboptimal.
I've searched for some time, but could not find anything. If there is a solutions for later versions of SQL Server I would be interested in it too.
Update
I know clustered index basics. Please note, database is put in read-only state for users.
Logically you can just ignore Date column in (Date, ID) index without any harm at all. Possibly this is a very specific use case that is not yet covered by SQL Server functionality.
Best Answer
The main characteristic of the clustered index is that the data is all in that order. That allows SQL to "read ahead". So for example if you created your CI on the Date column and ran queries pulling summary info for a week or a month then SQL could pull the data more quickly.
If on the other hand you are having to do seeks (a single Id for example) then the CI is no different than a covering NCI. Note the
covering
part of that.Here is something you could try. Place your CI on the
Date, OrderId
combination. Do this because you are more likely to pull range data on a date than an Id. Also because you said that 80% of your queries use the date. Some portion of these will presumably also use the OrderId.Then see what columns the queries that just use OrderId use and add those columns to an OrderId NCI using the
INCLUDE
clause. And just to save time if the answer is "they are using all of the columns on these queries" then a) you need to look at those queries and see if they really need all that data and b) yes you couldINCLUDE
all of the columns in the table.I realize that would be a HUGE
INCLUDE
. In fact it would double the size of your table and you'll want to carefully test that you aren't going to have a serious negative impact on write operations. However for reads it should work just fine.