I recently posted these questions in order to try and improve the performance for my queries.
One of the suggestions was that I should use a DATE()
column rather than columns such as yr
, yrmth
(e.g. 201502), mth
, day
(which was how the data was given to me). Given the built in functions. However when I changed over to this, it slowed down a couple of my queries to the point where they aren't suitable for using on a web page. Following this, I decided to change back to the original way of using a year column, and a yearmonth column (theyre the granularities the page looks at), and my queries run in less than a few seconds.
I often see posts which 'condemn' this method of searching a table, however I can execute queries in a tenth of the time the 'less recommended way'.
If it is bad practice, why is it?
Is it a case of, if it works better that way, then use it? Or should I really not be seeing a 10x difference in query duration between the two?
Edit: My use case is write once, read lots, and the date values in rows will never change.
Best Answer
It is a bad practice in a transactional/operational database and a good practice in an analytic/data warehouse database.
In a data warehouse I recommend using an integer "smart key"† for your date dimension (like 20160328), but also including an actual date-type column, as well as the broken-down date parts.
The broken down date parts are more for ease of analysis, than speed of querying.
† This makes it easy to partition your fact tables.