Sql-server – how to align data with an specific partition? plus 2 other questions related to the same table partition

optimizationpartitioningsql serversql server 2014

I am creating a table for testing purposes:

begin try
drop table countries
end try
begin catch
end catch
go


create table countries(
name nvarchar(100) not null,
visit date not null,
constraint pk__countries primary key clustered (visit,name))


set nocount on

insert into countries VALUES ('Portugal-Lisbon','20111101')
insert into countries VALUES ('Spain-Madrid','20111101')
insert into countries VALUES ('Italia-Milano','20120101')
insert into countries VALUES ('Norway-Thromso','20121201')
insert into countries VALUES ('USA-California','20160110')
insert into countries VALUES ('Brasil-Porto Alegre','20131101')
insert into countries VALUES ('Lithuania-Vilnius','20131101')
insert into countries VALUES ('France-Paris','20131101')
insert into countries VALUES ('Russia-Moscow','20141101')
insert into countries VALUES ('Germany-Liepzig','20140901')
insert into countries VALUES ('Germany-Hamburg','20100901')
insert into countries VALUES ('Estonia-Viru','20151101')
insert into countries VALUES ('Sweden-Gotemborg','20151101')
insert into countries VALUES ('Latvia-Riga','20141101')
insert into countries VALUES ('India-Vrindavana','20151101')
insert into countries VALUES ('Switzerland-Lugano','20161101')
insert into countries VALUES ('USA-New York','20160110')
insert into countries VALUES ('Russia-Cheliabinsky','20170110')
insert into countries VALUES ('Italia-Bergamo','20170101')
insert into countries VALUES ('UK-Telford','20170405')
insert into countries VALUES ('Spain-Barcelona','20101101')
insert into countries VALUES ('Spain-Fuerteventura','20101001')

then I create my partition function and partition schema:

---------------------------------------------------------------------------------------
--- create a partition function
-- https://technet.microsoft.com/en-us/library/ms186307(v=sql.110).aspx
---------------------------------------------------------------------------------------

use radhe
go


IF EXISTS (SELECT * FROM sys.partition_functions
    WHERE name = 'PF_year')
DROP PARTITION FUNCTION PF_year ;
GO



CREATE PARTITION FUNCTION PF_year (date)
AS RANGE RIGHT 
               FOR VALUES ( '20100101',
                            '20110101',
                            '20120101',
                            '20130101',
                            '20140101',
                            '20150101',
                            '20160101');




---------------------------------------------------------------------------------------
--- Create a Partition Scheme
---------------------------------------------------------------------------------------
use radhe
go
DROP PARTITION SCHEME PSC_YEAR 
GO


CREATE PARTITION SCHEME PSC_YEAR AS 
PARTITION PF_year   TO 
([PRIMARY],[F0],[F1], [F2], [F3], [F4], [F5], [F6]) 
GO

After that I make my table to use the partition schema:

--================================================================================
--================================================================================
use radhe
go

ALTER TABLE countries
DROP CONSTRAINT [pk__countries] 
go

CREATE UNIQUE CLUSTERED INDEX pk__countries   
ON dbo.countries (visit desc ,name asc)
WITH (DROP_EXISTING=off,ONLINE=Off,FILLFACTOR=100,DATA_COMPRESSION=PAGE) ON PSC_YEAR (visit)
go

All that done – and to verify that it is all done alright I have the following script to check how many rows in each of the partitions:

SELECT [table_name] = SCHEMA_NAME([schema_id]) + '.' + t.[name]
      ,i.[name] AS [index_name]
      ,i.[type_desc] AS [index_type]
      ,ps.[name] AS [partition_scheme]
      ,pf.[name] AS [partition_function]
      ,p.[partition_number]
      ,r.[value] AS [current_partition_range_boundary_value]
      ,p.[rows] AS [partition_rows]
      ,p.[data_compression_desc]
FROM sys.tables t
INNER JOIN sys.partitions p ON p.[object_id] = t.[object_id]
INNER JOIN sys.indexes i ON p.[object_id] = i.[object_id]
                           AND p.[index_id] = i.[index_id]
INNER JOIN sys.data_spaces ds ON i.[data_space_id] = ds.[data_space_id]
INNER JOIN sys.partition_schemes ps ON ds.[data_space_id] = ps.[data_space_id]
INNER JOIN sys.partition_functions pf ON ps.[function_id] = pf.[function_id]
LEFT JOIN sys.partition_range_values AS r ON pf.[function_id] = r.[function_id]
    AND r.[boundary_id] = p.[partition_number]
where t.object_id = object_id('dbo.countries')
GROUP BY SCHEMA_NAME([schema_id])
        ,t.[name]
        ,i.[name]
        ,i.[type_desc]
        ,ps.[name]
        ,pf.[name]
        ,p.[partition_number]
        ,r.[value]
        ,p.[rows]
        ,p.[data_compression_desc]
ORDER BY SCHEMA_NAME([schema_id])
        ,t.[name]
        ,i.[name]
        ,p.[partition_number];

and the above script gives me the following result:

but when I run the following script, to see how many rows I have in each year (because I partitioned per year) I get the following results:

select c.name,
       [year of visit] = year(visit),
       [number of visits] = count(*) over (partition by year(visit))
from countries c
order by visit desc

how is this that my records of 2010 are not on partition 1?

can I find out exactly on which partition a record is?

how do I add another partition, let's say next year (2018) I will add the partition for 2017, without having to rebuild the clustered index?

last but not least, let's say in a few year time down the line, I will not be interested anymore in the rows of 2010. how can I get rid of all that partition?

here below is how this database is created, basically my goal was to put each partition on a different file.

CREATE DATABASE [radhe]
 CONTAINMENT = NONE
 ON  PRIMARY 
( NAME = N'radhe', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.SQL2014\MSSQL\DATA\radhe.mdf' , SIZE = 70254592KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F0] 
( NAME = N'F0', FILENAME = N'C:\sql2014_data\RADHE0.ndf' , SIZE = 1048576KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F1] 
( NAME = N'F1', FILENAME = N'C:\sql2014_data\RADHE1.ndf' , SIZE = 1048576KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F2] 
( NAME = N'F2', FILENAME = N'C:\sql2014_data\RADHE2.ndf' , SIZE = 3145728KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F3] 
( NAME = N'F3', FILENAME = N'C:\sql2014_data\RADHE3.ndf' , SIZE = 3145728KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F4] 
( NAME = N'F4', FILENAME = N'C:\sql2014_data\RADHE4.ndf' , SIZE = 3145728KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F5] 
( NAME = N'F5', FILENAME = N'C:\sql2014_data\RADHE5.ndf' , SIZE = 3145728KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F6] 
( NAME = N'F6', FILENAME = N'C:\sql2014_data\RADHE6.ndf' , SIZE = 3145728KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F7] 
( NAME = N'F7', FILENAME = N'C:\sql2014_data\RADHE7.ndf' , SIZE = 1048576KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [Facts ] 
( NAME = N'Facts', FILENAME = N'C:\sql2014_data\facts.ndf' , SIZE = 45613056KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [Facts2] 
( NAME = N'Facts2', FILENAME = N'C:\sql2014_data\facts2.ndf' , SIZE = 59244544KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB )
 LOG ON 
( NAME = N'radhe_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.SQL2014\MSSQL\DATA\radhe_log.ldf' , SIZE = 2097152KB , MAXSIZE = 2048GB , FILEGROWTH = 1048576KB )
END

these links below are interesting references regarding table partitioning:

How to partition an existing non-partitioned table?

What's the best way to archive all but current year and partition the table at the same time?

Partition Key questions in SQL Server 2008

Best Answer

how is this that my records of 2010 are not on partition 1?

With a RANGE RIGHT function, values greater than or equal to the first boundary ('20100101') and less than the second boundary ('20110101') are stored in partition 2. Only rows less than first '20100101' boundary are stored in partition 1. No rows in your test data meet this criteria.

can I find out exactly on which partition a record is?

Invoke the partition function using $PARTITION along with the partitioning column value:

SELECT 
      visit
    , $PARTITION.PF_Year(visit) AS PartitionNumber
FROM dbo.countries;

how do I add another partition, let's say next year (2018) I will add the partition for 2017, without having to rebuild the clustered index?

Set the partition scheme NEXT USED filegroup to the desired filegroup and SPLIT the function with the new boundary. This should be done before 2018 data exists, ideally with the existing partition (2017) empty, to avoid costly data movement.

ALTER PARTITION SCHEME PSC_YEAR
    NEXT USED [F8];
ALTER PARTITION FUNCTION PF_Year()
    SPLIT RANGE('20180101');

last but not least, let's say in a few year time down the line, I will not be interested anymore in the rows of 2010. how can I get rid of all that partition?

SWITCH the partition into an aligned staging table of the same schema and remove the partition boundary with MERGE. I suggest partitioning the staging table using the same scheme as the source to ensure alignment and truncating the staging table immediately after the SWITCH.

CREATE TABLE dbo.countries_staging(
      name nvarchar(100) not null
    , visit date not null
);

CREATE UNIQUE CLUSTERED INDEX pk__countries_staging   
ON dbo.countries_staging (visit desc ,name asc)
WITH(DATA_COMPRESSION=PAGE)
ON PSC_YEAR (visit);
GO

ALTER TABLE dbo.countries
    SWITCH PARTITION $PARTITION.PF_Year('20100101')
    TO dbo.countries_staging PARTITION $PARTITION.PF_Year('20100101');
TRUNCATE TABLE dbo.countries_staging;
ALTER PARTITION FUNCTION PF_Year()
    MERGE RANGE('20100101');
GO

Related Solutions

Sql-server – Switching Data In Fails with “allows values that are not allowed by check constraints or partition function on target table”

The thing about CHECK constraints is they only disallow rows for which the predicate returns FALSE. If the check returns UNKNOWN, that is not FALSE, so the row passes the check:

CREATE TABLE dbo.T1 (id int NULL CHECK (id = 1));

INSERT dbo.T1 VALUES (1); -- Ok
INSERT dbo.T1 VALUES (2); -- Error
INSERT dbo.T1 VALUES (NULL); -- Ok!

Your check constraint does not disallow NULL values, which is the out-of-range 'value' the SWITCH statement is objecting to. Your switch-in table might contain nulls, which do not belong in partition 2.

Add AND source_id IS NOT NULL to your CHECK constraint, when the destination partition is not partition 1 (where the nulls go).

Postgresql – How to partition existing table in postgres

Since #1 requires copying data from the master to the child while it is in an active production environment, I personally went with #2 (creating a new master). This prevents disruptions to the original table while it is actively in use and if there are any issues, I can easily delete the new master without issue and continue using the original table. Here are the steps to do it:

Create new master table.

CREATE TABLE new_master (
    id          serial,
    counter     integer,
    dt_created  DATE DEFAULT CURRENT_DATE NOT NULL
);

Create children that inherit from master.

CREATE TABLE child_2014 (
    CONSTRAINT pk_2014 PRIMARY KEY (id),
    CONSTRAINT ck_2014 CHECK ( dt_created < DATE '2015-01-01' )
) INHERITS (new_master);
CREATE INDEX idx_2014 ON child_2014 (dt_created);

CREATE TABLE child_2015 (
    CONSTRAINT pk_2015 PRIMARY KEY (id),
    CONSTRAINT ck_2015 CHECK ( dt_created >= DATE '2015-01-01' AND dt_created < DATE '2016-01-01' )
) INHERITS (new_master);
CREATE INDEX idx_2015 ON child_2015 (dt_created);

...

Copy all historical data to new master table

INSERT INTO child_2014 (id,counter,dt_created)
SELECT id,counter,dt_created
from old_master
where dt_created < '01/01/2015'::date;

Temporarily pause new inserts/updates to production database

Copy most recent data to new master table

INSERT INTO child_2015 (id,counter,dt_created)
SELECT id,counter,dt_created
from old_master
where dt_created >= '01/01/2015'::date AND dt_created < '01/01/2016'::date;

Rename tables so that new_master becomes the production database.

ALTER TABLE old_master RENAME TO old_master_backup;
ALTER TABLE new_master RENAME TO old_master;

Add function for INSERT statements to old_master so that data gets passed to correct partition.

CREATE OR REPLACE FUNCTION fn_insert() RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.dt_created >= DATE '2015-01-01' AND
         NEW.dt_created < DATE '2016-01-01' ) THEN
        INSERT INTO child_2015 VALUES (NEW.*);
    ELSIF ( NEW.dt_created < DATE '2015-01-01' ) THEN
        INSERT INTO child_2014 VALUES (NEW.*);
    ELSE
        RAISE EXCEPTION 'Date out of range';
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

Add trigger so that function is called on INSERTS

CREATE TRIGGER tr_insert BEFORE INSERT ON old_master
FOR EACH ROW EXECUTE PROCEDURE fn_insert();

Set constraint exclusion to ON
```
SET constraint_exclusion = on;
```
Re-enable UPDATES and INSERTS on production database
Set up trigger or cron so that new partitions get created and function gets updated to assign new data to correct partition. Reference this article for code examples
Delete old_master_backup

Best Answer

Related Solutions

Sql-server – Switching Data In Fails with “allows values that are not allowed by check constraints or partition function on target table”

Postgresql – How to partition existing table in postgres

Related Question