Sql-server – Will creating partitions reduce locking and how do we implement this in sql-server

lockingpartitioningperformancesql serversql-server-2005

A client wants our application to process more data faster so arranged a meeting with their dba to discuss options.

This application generates quite a lot of data that is used for reporting. Before each run the old data for that item is deleted, the calculations are performed and then the new data inserted. In busy periods the users queue up hundreds of these generation tasks and we run upto 30 of them concurrently. Each run might create 60K rows.

The dba has suggested we could change the application to use 30 partitions (eg. one per thread) to reduce locking between threads during insert and delete. They suggested that in standard sql we could do something like

INSERT INTO schema.table.partition (...) VALUES (...)

I do not see this syntax in the msdn docs and this will mean changing this application which is a pain but is it even possible to do this? As I understand we would instead partition based on columns of the tables using partition functions?

I've read the create partition function docs but am not completely sure how to create a function to meet our needs. To make matters worse I don't yet have enterprise edition to try this out on so my apologies for incorrect syntax.

I am thinking that for example if we have an items table and an itemdata table with data for that item we might partition itemdata table by splitting the data based on a function like itemid mod 30. This would put item 1 in partition 1, item 2 in partition 2, etc. I'm not sure if we could do this in the partition function, in the scheme, table declaration or would we need to create a calculated column and use a values clause? Also not sure if we are going to see any performance improvement?

This is how I think we could implement this:

CREATE PARTITION FUNCTION SplittingItemIds_PFunc(decimal(18,0)) AS
RANGE LEFT FOR VALUES
(0,1,2,3, ... ,29)

CREATE PARTITION SCHEME SplittingItemIds_Scheme 
AS PARTITION SplittingItemIds_PFunc
ALL TO ([PRIMARY]);

CREATE TABLE ItemData  
(
    Id decimal(18,0),
    ItemId decimal(18,0),
    ...
)
ON PartitionSplittingItemIds_Scheme(ItemId % 30)

CREATE INDEX ItemData_ItemId_Idx ON ItemData(ItemId);

Is this kindof right?
From what I've read the index will be automatically partitioned – is that correct?

Best Answer

It sounds like the dba is talking about horizontal partitioning rather than table partitioning, by breaking the troublesome tables using rules, such as all customers that start with the letter a go in tableA, b in tableB, etc. This can be helpful in some circumstances, and can be done with any edition of SQL server, but has many of the same issues already mentioned, i.e. I/O.

Related Solutions

Sql-server – Partitioning in SQL Server 2008

For what you want to do, I would recommend the following (which is pretty much what you were thinking).

1> Create history tables for the historic data you have - keep the schemas as similar as possible. Split up by some logical grouping (such as year/month) based on how they are going to be queried (say you need to report with in month/year as well as all). Do not worry about the table size of the splits unless they are getting into the TB size range (your dbms should handle it) just make sure that they are appropriately indexed for the queries that need to be run. You should consider putting these onto a different disk to the active data if performance is an issue.

2> Create a routine to move data from the active table to the relevant historic table. Run this periodically. As a practice rebuild the indexes on the table that has had the data removed from it, and maybe update the table statistics. Easiest way to do this is to write a sql script.

3> Consider the reporting you want to do. If you want to only have to deal with 1 table when writing queries, create a view that joins the archived tables together. Create indexes on all the tables to suite the view. This way if you want all the data, select from the view. If you want data from a specific year/month, query that table. The view will look something like:

create view view_all_data as select "Jan12" as month,a.* from data_Jan12 a union select "Feb12" as month,b.* from data_Feb12 b ....

I am assuming here that the system is not a highly used transactional system and that you have windows of low usage to run the analysis queries. If you need to maintain high levels of performance, you may like to do the above in a separate database (separate hardware) and port across the new data that you get from backups.

Sql-server – Expanding a partitioned table

Here are two best practices for partitioning that pertain to the question:

Keep an empty staging partition at the leftmost and rightmost ends of the partition range to ensure that the partitions split when loading in new data, and merge, after unloading old data, do not cause data movement.
Do not split or merge a partition already populated with data because this can cause severe locking and explosive log growth.

http://www.informit.com/articles/article.aspx?p=1946159&seqNum=5

If the leftmost end of your partition is empty, use ALTER PARTITION FUNCTION SPLIT RANGE to add new ranges to the partition function.

To check if the leftmost partition is empty, use a query like the following:

DECLARE @PartitionFunctionName sysname = 'YourPartitionFunctionNameHere';

SELECT 
p.partition_number, SUM(pst.row_count) RowCountInPartition, pf.name PartitionFunction, ps.name PartitionScheme
FROM sys.dm_db_partition_stats pst
INNER JOIN sys.partitions p ON pst.partition_id = p.partition_id
INNER JOIN sys.indexes i ON p.object_id = i.object_id AND p.index_id = i.index_id
INNER JOIN sys.partition_schemes ps ON ps.data_space_id = i.data_space_id
INNER JOIN sys.partition_functions pf ON ps.function_id = pf.function_id
WHERE pf.name = @PartitionFunctionName
GROUP BY p.partition_number, pf.name, ps.name;

If the first partition is not empty, the best practices recommend that you create a new function with all values, create a new table on that function, then insert the data to the new table.

Also, if the left partition just has a few records, a split may be fine. Not sure on that as I've never tried it.

Whatever you do, make sure to leave some empty partitions at the leftmost and rightmost partition when you're finished. I might even go so far as to creating partition ranges for 0 and 1, then add a check constraint to prevent the first partition from getting data in it. Do the same thing for the end.

Best Answer

Related Solutions

Sql-server – Partitioning in SQL Server 2008

Sql-server – Expanding a partitioned table

Related Question