SQL Server – Does Partitioning Help Without Querying on Partition Value

partitioningsql serversql-server-2012

Say I have an orders.Order table that is partitioned on CreatedWhen (a datetime column set to GetDate() when the row is created).

The table has over 10 million rows and the partition is quarterly (or some such regular interval).

If I run a query like:

SELECT ord.OrderId, ord.ClientId, ord.ReceptId, ord.Cost, agt.State
FROM   orders.Order ord
       join personnel.Agent agt
          on ord.AgentId = agt.AgentId
WHERE  agt.FirstName = 'Bob'
       and LastName = 'Whiely'
       and ord.Cost > 135

Am I going to see any benefit from the partition for this query?

I don't see how there could be a benefit because I am not using the partition value in my query and all of the table still needs to be searched.

Note: The results of the query will all be in the most recent partition. But I don't see how that would help.

Second Note: The partitions are all on the same SAN drive (Sys Admins won't let it be any other way), so the disk IO will not be parallel.

Is there a better way to partition so that random queries, like the one above, will perform better? (Maybe on the primary key?)

Third Note: The query above is made up. I am using it to show that I have many queries that hit many random columns (which are properly indexed).

Best Answer

Partitioning is not typically a performance enhancer. Normally, partitioning is employed to enable efficient data management. For instance, you can employ partitioning to easily swap old rows out of a table into an archive table, one partition at a time.

Querying against a partitioned table where your query does not include the partitioning key forces SQL Server to look at the entire table. This is less efficient than simply scanning or seeking into a non-partitioned table because each partition is essentially a table, necessitating a UNION ALL style query against all the partitions. This is not particularly obvious when looking at a query plan for a partitioned table, although you can see it if you look carefully at the pop-up properties for an index or table seek or scan:

enter image description here

To make this more obvious, lets set up a simple test rig in tempdb, with a date-based partitioning key:

USE tempdb;
GO
IF OBJECT_ID(N'dbo.Tab', N'U') IS NOT NULL
BEGIN
    DROP TABLE dbo.Tab;
    DROP PARTITION SCHEME PartScheme;
    DROP PARTITION FUNCTION PartFun;
END
GO

CREATE PARTITION FUNCTION PartFun (datetime)
AS RANGE LEFT
FOR VALUES (
      N'2012-01-01T00:00:00'
    , N'2012-04-01T00:00:00'
    , N'2012-07-01T00:00:00'
    , N'2012-10-01T00:00:00'
    , N'2013-01-01T00:00:00'
    , N'2013-04-01T00:00:00'
    , N'2013-07-01T00:00:00'
    , N'2013-10-01T00:00:00'
    , N'2014-01-01T00:00:00'
    , N'2014-04-01T00:00:00'
    , N'2014-07-01T00:00:00'
    , N'2014-10-01T00:00:00'
);

CREATE PARTITION SCHEME PartScheme
AS PARTITION PartFun
ALL TO ([PRIMARY]);

IF OBJECT_ID(N'dbo.Tab', N'U') IS NOT NULL
DROP TABLE dbo.Tab;
CREATE TABLE dbo.Tab
(
    TabID int NOT NULL
    , CreateDate datetime NOT NULL
    , Data1 varchar(100) NOT NULL
    , Data2 varchar(10) NOT NULL
    , Data3 varchar(1000) NOT NULL
    , CONSTRAINT PK_Tab
        PRIMARY KEY CLUSTERED
        (CreateDate, TabID)
) ON [PartScheme](CreateDate);

This will populate the table with 1,000,000 rows with randomly generated data evenly spread over all partitions:

;WITH Ten AS
(
    SELECT v.Num
    FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9))v(Num)
)
, Million AS 
(
    SELECT Num = (t6.Num * POWER(10, 5)) 
        + (t5.Num * POWER(10, 4)) 
        + (t4.Num * POWER(10, 3)) 
        + (t3.Num * POWER(10, 2)) 
        + (t2.Num * POWER(10, 1)) 
        + (t1.Num) 
    FROM Ten t1
        CROSS JOIN Ten t2
        CROSS JOIN Ten t3
        CROSS JOIN Ten t4
        CROSS JOIN Ten t5
        CROSS JOIN Ten t6
)
INSERT INTO dbo.Tab (TabID, CreateDate, Data1, Data2, Data3)
SELECT m.Num, DATEADD(DAY, m.Num % 1000, N'2012-01-01T00:00:00')
    , CONVERT(varchar(100), CRYPT_GEN_RANDOM(100))
    , CONVERT(varchar(10), CRYPT_GEN_RANDOM(10))
    , CONVERT(varchar(1000), CRYPT_GEN_RANDOM(1000))
FROM Million m;

Let's go ahead and create a non-clustered index using our partitioning scheme:

CREATE NONCLUSTERED INDEX IX_Tab
ON dbo.Tab(Data1)
ON [PartScheme](CreateDate);

Now, let's look at three "typical" queries you might execute (the first query just gets an actual value out of the table, which is then used in the WHERE clause for the three queries):

DECLARE @d varchar(100);
SELECT TOP(1) @d = t.Data1
FROM dbo.Tab t
WHERE t.CreateDate > N'2013-08-24T00:00:00';

SELECT Data1 
FROM dbo.Tab
WHERE Tab.Data1 = @d;

SELECT Data1 
FROM dbo.Tab
WHERE Tab.Data1 = @d
    AND tab.CreateDate >= N'2013-08-23T00:00:00'

SELECT Data1 
FROM dbo.Tab
WHERE Tab.Data1 = @d
    AND tab.CreateDate >= N'2013-08-23T00:00:00'
    AND tab.CreateDate <= N'2013-08-29T00:00:00';

Here is a quick screenshot of the execution plans so you can see the difference adding the partitioning key to the WHERE clause can make:

enter image description here

The "query costs relative to the batch" numbers are instructive here. The first query consumes 50% of the batch cost since it performed a scan of all 13 partitions. It does that since it has no idea where in the table the particular value of Data1 exists. The second query consumes only 23% by virtue of the "Start Date" parameter, which can eliminate half the partitions. Finally, the third query only consumes 4% since it only needs to look at a single partition, because we included a "start date" and an "end date".

The "statistics" for the 3 queries are:

Table 'Tab'. Scan count 13, logical reads 36, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Tab'. Scan count 6, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Tab'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

What if we add one of the other columns to these queries:

DECLARE @d varchar(100);
SELECT TOP(1) @d = t.Data1
FROM dbo.Tab t
WHERE t.CreateDate > N'2013-08-24T00:00:00';

SELECT Data1 
    , Data2
FROM dbo.Tab
WHERE Tab.Data1 = @d;

SELECT Data1 
    , Data2
FROM dbo.Tab
WHERE Tab.Data1 = @d
    AND tab.CreateDate >= N'2013-08-23T00:00:00'

SELECT Data1 
    , Data2
FROM dbo.Tab
WHERE Tab.Data1 = @d
    AND tab.CreateDate >= N'2013-08-23T00:00:00'
    AND tab.CreateDate <= N'2013-08-29T00:00:00';

The execution plans:

enter image description here

The statistics for those:

Table 'Tab'. Scan count 13, logical reads 39, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Tab'. Scan count 6, logical reads 19, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Tab'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So, lets just do away with the partitioning, and re-run all those tests:

IF OBJECT_ID(N'dbo.TabU', N'U') IS NOT NULL
DROP TABLE dbo.TabU;
CREATE TABLE dbo.TabU
(
    TabID int NOT NULL
    , CreateDate datetime NOT NULL
    , Data1 varchar(100) NOT NULL
    , Data2 varchar(10) NOT NULL
    , Data3 varchar(1000) NOT NULL
    , CONSTRAINT PK_TabU
        PRIMARY KEY CLUSTERED
        (CreateDate, TabID)
) ON [PRIMARY];

;WITH Ten AS
(
    SELECT v.Num
    FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9))v(Num)
)
, Million AS 
(
    SELECT Num = (t6.Num * POWER(10, 5)) 
        + (t5.Num * POWER(10, 4)) 
        + (t4.Num * POWER(10, 3)) 
        + (t3.Num * POWER(10, 2)) 
        + (t2.Num * POWER(10, 1)) 
        + (t1.Num) 
    FROM Ten t1
        CROSS JOIN Ten t2
        CROSS JOIN Ten t3
        CROSS JOIN Ten t4
        CROSS JOIN Ten t5
        CROSS JOIN Ten t6
)
INSERT INTO dbo.TabU (TabID, CreateDate, Data1, Data2, Data3)
SELECT m.Num, DATEADD(DAY, m.Num % 1000, N'2012-01-01T00:00:00')
    , CONVERT(varchar(100), CRYPT_GEN_RANDOM(100))
    , CONVERT(varchar(10), CRYPT_GEN_RANDOM(10))
    , CONVERT(varchar(1000), CRYPT_GEN_RANDOM(1000))
FROM Million m;

CREATE NONCLUSTERED INDEX IX_TabU
ON dbo.TabU(Data1)
ON [PRIMARY];

Now, the first set of queries:

DECLARE @d varchar(100);
SELECT TOP(1) @d = t.Data1
FROM dbo.TabU t
WHERE t.CreateDate > N'2013-08-24T00:00:00';

SELECT Data1 
FROM dbo.TabU
WHERE TabU.Data1 = @d;

SELECT Data1 
FROM dbo.TabU
WHERE TabU.Data1 = @d
    AND TabU.CreateDate >= N'2013-08-23T00:00:00'

SELECT Data1 
FROM dbo.TabU
WHERE TabU.Data1 = @d
    AND TabU.CreateDate >= N'2013-08-23T00:00:00'
    AND TabU.CreateDate <= N'2013-08-29T00:00:00';

And the plans:

enter image description here

And the stats:

Table 'TabU'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'TabU'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'TabU'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

All three queries now have the same "cost", which is substantially lower than the partitioned version.

The queries with Data1 and Data2 columns:

DECLARE @d varchar(100);
SELECT TOP(1) @d = t.Data1
FROM dbo.TabU t
WHERE t.CreateDate > N'2013-08-24T00:00:00';

SELECT Data1 
    , Data2
FROM dbo.TabU
WHERE TabU.Data1 = @d;

SELECT Data1 
    , Data2
FROM dbo.TabU
WHERE TabU.Data1 = @d
    AND TabU.CreateDate >= N'2013-08-23T00:00:00'

SELECT Data1 
    , Data2
FROM dbo.TabU
WHERE TabU.Data1 = @d
    AND TabU.CreateDate >= N'2013-08-23T00:00:00'
    AND TabU.CreateDate <= N'2013-08-29T00:00:00';

enter image description here

and the stats:

Table 'TabU'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'TabU'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'TabU'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Clearly, if you can't do the queries the correct way, by including the partitioning key, then you are going to see worse performance with partitioning than without.