Sql-server – SQL Server 2008 Partitioning Evaluation

partitioningperformancesql server

I'm trying to do a performance comparison of a table with partitioning and the same table without partitioning.

So we have Subcriber and SubscriberPartitioned tables.

Tables structure is:

SubscriberId | Name | Email | Telephone | UserId (partition column)

Data: I'm using SQL Data Generator:

  • to fill the tables with 10 million rows each
  • the UserId range goes from 1-200

Partitioning: I have partitioned the table SubcriberPartioned in 2 according to the UserId, which goes from 1 to 200. So we have approx 5 million rows in each partition.

Performance measure: I'm using SQL Server Profiler to measure the query times.

  1. Why does SQL Server differ so much in query times?

    For example the query

    select * from subcriber where SubscriberId = 1000 
    

    … will the first time take like 40 seconds, and if I re-execute it will take less than a second. If I try with another Id it also will take less than a second. But the first time took a lot of time.

    So, is there any optimization or caching being done automatically in the background by SQL Server?

    I was expecting the same query to take the same time if repeated over time…

  2. Which do you think would be the best queries to run and how to measure them in order to test the partitioning of the table and to see the optimization gained with the partition strategy instead of the normal table.

EDIT:

  1. After answers on this post I made some cold start tests, and found that the non partitioned table was performing faster than the partitioned table.

Specially this query caught my attention:

select * from [table] where IdUser = 100

Why doesn't the partitioned table improve the performance on it? The above query includes a filter by the partition column, wouldn't that focus the engine to scan only half of the records on the partitioned approach?

EDIT 2:

I have performed the tests again. Assigning the second partition a different file group, and I have reached a performance improvement of 50 % in the mentioned query.

Best Answer

What do you expect your real life volume of data to be?

For 10 million rows, I wouldn't bother with partitioning. The overhead far outweighs the benefits: partitioning isn't a silver bullet to cure performance issues.

To answer,

Point 1: on the first run, data needs loaded into memory ("buffer pool") and will stay cached until evicted based on memory pressure and usage. Personally, I'd test with the cache filled because you'd expect your app to require that data very often, especially if you think partitioning is the solution to some problem

For point 2, what queries do you expect to run in production? The queries should be representative of this production load. However they should test different realistic filter combinations with and without partition key at least.

Edit, some reading, after comments below: