Sql-server – Does table size matter

I have a table like this (with more columns though):

CREATE TABLE [dbo].[MyTable](
    [SnapKey] [int] NOT NULL,
    [SnapDt] [smalldatetime] NOT NULL,
    [Company] [varchar](4) NOT NULL,
    [ProfitCenter] [varchar](10) NOT NULL,
    [CostCenter] [varchar](10) NOT NULL,
) ON [MyPartition]([SnapKey])

CREATE CLUSTERED INDEX [IDX1] ON [dbo].[MyTable]
(
    [SnapKey] ASC
)

Table is partitioned on SnapKey. SnapKey is the date part of SnapDt, stored as an integer, for example 20160131 and 20160229.

Each partition contains only 1 SnapKey. For each SnapKey partition, I have about 5 million rows. Currently I'm keeping only the last day of each month in my table.

I always use SnapKey for querying. No updates happen to the data. For each day, data is populated into the table and then we run some reports on it during the month.

Question: If I keep data for 6 days per month, instead of 1 day per month, will my queries perform slower?

I couldn't find any clear answer, so I tried to populate the table with data but I ran out of storage, so I decided to ask you, to see if there is any theoretical explanation.

Clarification

By storing 5 more days, we will store 6 times the data (for historical reporting). We will keep the last 6 days of the month instead of only the last day.

Our queries will not change and our reports are still over 1 single day (one SnapKey).

We have one SnapKey per month. For now we have

20160131
20160229
20160330

…and so on. One SnapKey for each month-end.

By having 5 days more, SnapKey will look like:

20160126, 20160127, 20160128, 20160129, 20160130, 20160131 
20160224, 20160225, 20160226, 20160227, 20160228, 20160229 
20160325, 20160326, 20160327, 20160328, 20160329, 20160330 ...and so on

You'll see that we store 6 times more data, but still work on only one SnapKey in our queries. This means we always have:

WHERE SnapKey = xxxxxxxx

in all of our queries.

Best Answer

If I keep data for 6 days per month, instead of 1 day per month, will my queries perform slower?

It depends.

No - if you run exactly the same queries as before (no access to the new data at all).

SQL Server's partitioning implementation creates a separate rowset for each partition, so when you create a partitioned index, it creates a separate b-tree structure* for each partition (note that partitioned heaps also exist).

Simply adding more partitions therefore changes nothing from the point of view of the existing partitions - the indexes are exactly the same. Your queries access only a single partition, so nothing changes.

Maybe - if you query the new data at any stage. Bringing the new data into memory may displace data needed by the original queries, depending on the amount of memory you have. If the change results in new physical I/O, you will see an impact on performance, with the severity depending on the capability of the storage subsystem.

* This is mentioned in many places in the product documentation, for example:

From those links:

When a clustered index has multiple partitions, each partition has a B-tree structure that contains the data for that specific partition.
When a nonclustered index has multiple partitions, each partition has a B-tree structure that contains the index rows for that specific partition.
When a heap has multiple partitions, each partition has a heap structure that contains the data for that specific partition.

You can also explore this for yourself by looking at the system catalog views like sys.partitions, which shows the hobt_id (heap or b-tree id) of the structure that contains the rows for a particular partition.

Clarification

Best Answer

Related Solutions

Sql-server – tsql: table data size won’t shrink after large row deletion

MySQL table architecture

Related Question