Sql-server – partitioning by datetime vs date vs int performance

datetimepartitioningsql server

I am trying to determine which partitioning strategy will be most efficient for my queries on SQL Server.

We know we want daily partitions. The data has a natural DATETIME field in it, so my gut instinct is to partition on that value. However, we have discussed the possibility of adding a DATE column (with just the time information stripped out), or possibly coding the date as an integer (YYYYMMDD, e.g., 20130930) and using that as the partitioning column.

All queries on this data will be on a specific date (WHERE ItemTime = '2013-09-30') or range of dates (WHERE ItemTime BETWEEN '2013-09-15' AND '2013-09-30'). For now, we always query on a date, but future requirements may include time details (WHERE ItemTime BETWEEN '2013-09-29 20:30:00' AND '2013-09-30 10:15:45').

I have tested performance of each strategy on a few hundred thousand rows of data, and seen no real difference. The production deployment, however, will be in the hundreds of millions up to maybe a couple billion rows.

Is one of these strategies going to lead to more efficient queries than the others? Why or why not?

Thanks for your help.

[EDIT] The queries will be formatted by application code, so I'm not concerned about how to translate between DATETIME, DATE, and INT. I can assume the query will be properly formatted based on whichever partitioning scheme is chosen. I'm just trying to find out if partition elimination will be faster using one of these data types.

Best Answer

You shouldn't get any additional performance gain by adding the extra column. If any, I doubt it will outweigh the additional storage cost required for the extra column.

There is one perk that you'll get from having a date only column and that is that you can have a date dimension. If this is for a data warehouse, I'd definitely recommend including.

If you decide to add the additional column (and assuming you're on SQL 2008 +), use the Date datatype. Int (formatted in YYYYMMDD) used to be the recommended format for partitioning, as it was cheaper (@ 4 bytes/row) than datetime (@ 8 bytes/row). Date is 3 bytes/row and is in a natural date format. Also, INT is a pain in the butt to query on as you have to add conversions in your' search arguments:

WHERE DateID = (CONVERT([int],CONVERT([char](8),getdate(),(112)),(0)))