Sql-server – how to align data with an specific partition? plus 2 other questions related to the same table partition

optimizationpartitioningsql serversql server 2014

I am creating a table for testing purposes:

begin try
drop table countries
end try
begin catch
end catch
go


create table countries(
name nvarchar(100) not null,
visit date not null,
constraint pk__countries primary key clustered (visit,name))


set nocount on

insert into countries VALUES ('Portugal-Lisbon','20111101')
insert into countries VALUES ('Spain-Madrid','20111101')
insert into countries VALUES ('Italia-Milano','20120101')
insert into countries VALUES ('Norway-Thromso','20121201')
insert into countries VALUES ('USA-California','20160110')
insert into countries VALUES ('Brasil-Porto Alegre','20131101')
insert into countries VALUES ('Lithuania-Vilnius','20131101')
insert into countries VALUES ('France-Paris','20131101')
insert into countries VALUES ('Russia-Moscow','20141101')
insert into countries VALUES ('Germany-Liepzig','20140901')
insert into countries VALUES ('Germany-Hamburg','20100901')
insert into countries VALUES ('Estonia-Viru','20151101')
insert into countries VALUES ('Sweden-Gotemborg','20151101')
insert into countries VALUES ('Latvia-Riga','20141101')
insert into countries VALUES ('India-Vrindavana','20151101')
insert into countries VALUES ('Switzerland-Lugano','20161101')
insert into countries VALUES ('USA-New York','20160110')
insert into countries VALUES ('Russia-Cheliabinsky','20170110')
insert into countries VALUES ('Italia-Bergamo','20170101')
insert into countries VALUES ('UK-Telford','20170405')
insert into countries VALUES ('Spain-Barcelona','20101101')
insert into countries VALUES ('Spain-Fuerteventura','20101001')

then I create my partition function and partition schema:

---------------------------------------------------------------------------------------
--- create a partition function
-- https://technet.microsoft.com/en-us/library/ms186307(v=sql.110).aspx
---------------------------------------------------------------------------------------

use radhe
go


IF EXISTS (SELECT * FROM sys.partition_functions
    WHERE name = 'PF_year')
DROP PARTITION FUNCTION PF_year ;
GO



CREATE PARTITION FUNCTION PF_year (date)
AS RANGE RIGHT 
               FOR VALUES ( '20100101',
                            '20110101',
                            '20120101',
                            '20130101',
                            '20140101',
                            '20150101',
                            '20160101');




---------------------------------------------------------------------------------------
--- Create a Partition Scheme
---------------------------------------------------------------------------------------
use radhe
go
DROP PARTITION SCHEME PSC_YEAR 
GO


CREATE PARTITION SCHEME PSC_YEAR AS 
PARTITION PF_year   TO 
([PRIMARY],[F0],[F1], [F2], [F3], [F4], [F5], [F6]) 
GO

After that I make my table to use the partition schema:

--================================================================================
--================================================================================
use radhe
go

ALTER TABLE countries
DROP CONSTRAINT [pk__countries] 
go

CREATE UNIQUE CLUSTERED INDEX pk__countries   
ON dbo.countries (visit desc ,name asc)
WITH (DROP_EXISTING=off,ONLINE=Off,FILLFACTOR=100,DATA_COMPRESSION=PAGE) ON PSC_YEAR (visit)
go

All that done – and to verify that it is all done alright I have the following script to check how many rows in each of the partitions:

SELECT [table_name] = SCHEMA_NAME([schema_id]) + '.' + t.[name]
      ,i.[name] AS [index_name]
      ,i.[type_desc] AS [index_type]
      ,ps.[name] AS [partition_scheme]
      ,pf.[name] AS [partition_function]
      ,p.[partition_number]
      ,r.[value] AS [current_partition_range_boundary_value]
      ,p.[rows] AS [partition_rows]
      ,p.[data_compression_desc]
FROM sys.tables t
INNER JOIN sys.partitions p ON p.[object_id] = t.[object_id]
INNER JOIN sys.indexes i ON p.[object_id] = i.[object_id]
                           AND p.[index_id] = i.[index_id]
INNER JOIN sys.data_spaces ds ON i.[data_space_id] = ds.[data_space_id]
INNER JOIN sys.partition_schemes ps ON ds.[data_space_id] = ps.[data_space_id]
INNER JOIN sys.partition_functions pf ON ps.[function_id] = pf.[function_id]
LEFT JOIN sys.partition_range_values AS r ON pf.[function_id] = r.[function_id]
    AND r.[boundary_id] = p.[partition_number]
where t.object_id = object_id('dbo.countries')
GROUP BY SCHEMA_NAME([schema_id])
        ,t.[name]
        ,i.[name]
        ,i.[type_desc]
        ,ps.[name]
        ,pf.[name]
        ,p.[partition_number]
        ,r.[value]
        ,p.[rows]
        ,p.[data_compression_desc]
ORDER BY SCHEMA_NAME([schema_id])
        ,t.[name]
        ,i.[name]
        ,p.[partition_number];

and the above script gives me the following result:
enter image description here

but when I run the following script, to see how many rows I have in each year (because I partitioned per year) I get the following results:

select c.name,
       [year of visit] = year(visit),
       [number of visits] = count(*) over (partition by year(visit))
from countries c
order by visit desc

enter image description here

how is this that my records of 2010 are not on partition 1?

can I find out exactly on which partition a record is?

how do I add another partition, let's say next year (2018) I will add the partition for 2017, without having to rebuild the clustered index?

last but not least, let's say in a few year time down the line, I will not be interested anymore in the rows of 2010. how can I get rid of all that partition?

here below is how this database is created, basically my goal was to put each partition on a different file.

CREATE DATABASE [radhe]
 CONTAINMENT = NONE
 ON  PRIMARY 
( NAME = N'radhe', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.SQL2014\MSSQL\DATA\radhe.mdf' , SIZE = 70254592KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F0] 
( NAME = N'F0', FILENAME = N'C:\sql2014_data\RADHE0.ndf' , SIZE = 1048576KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F1] 
( NAME = N'F1', FILENAME = N'C:\sql2014_data\RADHE1.ndf' , SIZE = 1048576KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F2] 
( NAME = N'F2', FILENAME = N'C:\sql2014_data\RADHE2.ndf' , SIZE = 3145728KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F3] 
( NAME = N'F3', FILENAME = N'C:\sql2014_data\RADHE3.ndf' , SIZE = 3145728KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F4] 
( NAME = N'F4', FILENAME = N'C:\sql2014_data\RADHE4.ndf' , SIZE = 3145728KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F5] 
( NAME = N'F5', FILENAME = N'C:\sql2014_data\RADHE5.ndf' , SIZE = 3145728KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F6] 
( NAME = N'F6', FILENAME = N'C:\sql2014_data\RADHE6.ndf' , SIZE = 3145728KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [F7] 
( NAME = N'F7', FILENAME = N'C:\sql2014_data\RADHE7.ndf' , SIZE = 1048576KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [Facts ] 
( NAME = N'Facts', FILENAME = N'C:\sql2014_data\facts.ndf' , SIZE = 45613056KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB ), 
 FILEGROUP [Facts2] 
( NAME = N'Facts2', FILENAME = N'C:\sql2014_data\facts2.ndf' , SIZE = 59244544KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1048576KB )
 LOG ON 
( NAME = N'radhe_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.SQL2014\MSSQL\DATA\radhe_log.ldf' , SIZE = 2097152KB , MAXSIZE = 2048GB , FILEGROWTH = 1048576KB )
END

these links below are interesting references regarding table partitioning:

How to partition an existing non-partitioned table?

What's the best way to archive all but current year and partition the table at the same time?

Partition Key questions in SQL Server 2008

Best Answer

how is this that my records of 2010 are not on partition 1?

With a RANGE RIGHT function, values greater than or equal to the first boundary ('20100101') and less than the second boundary ('20110101') are stored in partition 2. Only rows less than first '20100101' boundary are stored in partition 1. No rows in your test data meet this criteria.

can I find out exactly on which partition a record is?

Invoke the partition function using $PARTITION along with the partitioning column value:

SELECT 
      visit
    , $PARTITION.PF_Year(visit) AS PartitionNumber
FROM dbo.countries;

how do I add another partition, let's say next year (2018) I will add the partition for 2017, without having to rebuild the clustered index?

Set the partition scheme NEXT USED filegroup to the desired filegroup and SPLIT the function with the new boundary. This should be done before 2018 data exists, ideally with the existing partition (2017) empty, to avoid costly data movement.

ALTER PARTITION SCHEME PSC_YEAR
    NEXT USED [F8];
ALTER PARTITION FUNCTION PF_Year()
    SPLIT RANGE('20180101');

last but not least, let's say in a few year time down the line, I will not be interested anymore in the rows of 2010. how can I get rid of all that partition?

SWITCH the partition into an aligned staging table of the same schema and remove the partition boundary with MERGE. I suggest partitioning the staging table using the same scheme as the source to ensure alignment and truncating the staging table immediately after the SWITCH.

CREATE TABLE dbo.countries_staging(
      name nvarchar(100) not null
    , visit date not null
);

CREATE UNIQUE CLUSTERED INDEX pk__countries_staging   
ON dbo.countries_staging (visit desc ,name asc)
WITH(DATA_COMPRESSION=PAGE)
ON PSC_YEAR (visit);
GO

ALTER TABLE dbo.countries
    SWITCH PARTITION $PARTITION.PF_Year('20100101')
    TO dbo.countries_staging PARTITION $PARTITION.PF_Year('20100101');
TRUNCATE TABLE dbo.countries_staging;
ALTER PARTITION FUNCTION PF_Year()
    MERGE RANGE('20100101');
GO