Where should one put indexes in a time dimension table

data-warehousedatabase-designindex

After reading the Questions and Answers from this website about indexes, a question came to my mind.

What if, one is using a time dimension table with the lower level of granularity being the day. Where should one put the indexes ?

Randy Melder in the question : What does “index” means on RDBMS ? said :

Think of an index as "table of
contents"… that is an ordered list
of pointers to positions in a file,
aka offsets

In the case of the time dimension, most data research might be done either for a specific day, a specific week, a specific month or a specific quarter if the time table stores all the day for a unique year.

My question is : Should one put indexes for all those fields ?

Day is suppose to be unique so for this one I understand perfectly the use of indexes.
But a week id will have 7 occurences, a month id will have 30/31 occurences, a quarter id will have more or less 120 occurences.

Should one still put indexes for those fields ?
Will it still be useful?

I am asking you that because in the same question, David Spillett said :

Adding too many indexes can be a bad
optimisation of course, as the extra
space used to store the indexes (and
the IO-load to maintain them if your
DB sees many write operations) may be
a worse problem than the slightly less
optimal read queries, so don't over-do
it.

So what would be the best considerations for the time dimension case ?

Best Answer

You likely won't run into the issues of write problems, as I assume this would be something created once (or once per year), and then not touched.

But using an index will likely be a hinderance if you're searching by week ... The problem is, if the index is used, it might scan that first, and then grab each record out of the table individually, which when you're pulling out more than about 5-20% of the records, it's typically faster to do a full table scan, and then drop the records you don't care about.

I don't know of any major RDBMSes that don't optimize for this when it's well-distributed data. If it's not well distributed (eg, one of the values in a column occurs 95% of the time, but there's also other possible values), you may have to compute histograms on the table and not use a placeholder for the value when searching, so that the query optimizer has the value being searched for when generating the execution plan.

I'd likely not index day of week. I'd check my database's documentation to see what their tradeoff is for indexed reads vs. full table scans to see if I'd index the day of the month or month of the year. I'd likely index DOY/day of year if present (which sounds like it's your unique index, anyway)

Best Answer

Related Solutions

Sql-server – Datawarehouse Design: Combined Date Time dimension vs. Separate Day and Time dimensions and timezones

In star schemas, what should I do when a dimension has a date attribute

Related Question