Sql-server – sql server partitioned view wrong execution plan on update a temporary table

partitioningsql serversql server 2014view

i need your help because we have a performance issues. I would like to update a temporary table that contains a join to the partitioned view.

Schema:

create schema SCHEMA1
go
create schema SCHEMA2
go
create schema DATA
go

Partitioned Tables:

CREATE TABLE SCHEMA1.Positions (
    Month [smallint] NOT NULL,
    Client [tinyint] NOT NULL,
    Type [char](3) NOT NULL,
    Amount [int] NOT NULL,
 CONSTRAINT [PK_Positions] PRIMARY KEY CLUSTERED 
(
    Month ASC,
    Client ASC,
    Type ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
) on psDATA(Month)
ALTER TABLE SCHEMA1.Positions  WITH CHECK ADD  CONSTRAINT [CK_Positions_Client] CHECK  ((Client=(1)))
GO
ALTER TABLE SCHEMA1.Positions CHECK CONSTRAINT [CK_Positions_Client]
GO

CREATE TABLE SCHEMA2.Positions (
    Month [smallint] NOT NULL,
    Client [tinyint] NOT NULL,
    Type [char](3) NOT NULL,
    Amount [int] NOT NULL,
 CONSTRAINT [PK_Positions] PRIMARY KEY CLUSTERED 
(
    Month ASC,
    Client ASC,
    Type ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
) on psDATA(Month)
ALTER TABLE SCHEMA2.Positions  WITH CHECK ADD  CONSTRAINT [CK_Positions_Client] CHECK  ((Client=(2)))
GO
ALTER TABLE SCHEMA2.Positions CHECK CONSTRAINT [CK_Positions_Client]
GO

Partitioned View:

create view DATA.Positions as
    select * from SCHEMA1.Positions
    union all
    select * from SCHEMA2.Positions
go

Sample Data:

insert   SCHEMA1.Positions (Month, Client, Type, Amount)
values  (1804, 1, '001', 1000),
        (1802, 1, '003', 10000),
        (1705, 1, '004', 104),
        (1601, 1, '001', 12),
        (1512, 1, '005', 7),
        (1703, 1, '001', 12897),
        (1605, 1, '007', 3245),
        (1401, 1, 'ZUR', 45),
        (1201, 1, 'NOR', 175000)

insert  SCHEMA2.Positions (Month, Client, Type, Amount)
values  (1804, 2, '001', 1000),
        (1802, 2, '003', 10000),
        (1705, 2, '004', 104),
        (1601, 2, '001', 12),
        (1512, 2, '005', 7),
        (1703, 2, '001', 12897),
        (1605, 2, '007', 3245),
        (1401, 2, 'ZUR', 45),
        (1201, 2, 'NOR', 175000)

expected behaviour:

select   *
from    DATA.Positions
where   Client = 1

expected behaviour

non expected behaviour:

create table #t (
    Month smallint,
    Client tinyint,
    Type char(3),
    Amount int
)

insert  #t (Month, Client, Type)
values  (1605, 2, '007')

update  t
set     Amount = p.Amount
from    #t t
join    DATA.Positions p
        on  t.Month = p.Month
        and t.Client = p.Client
        and t.Type = p.Type

non expected behaviour

example with fixed client number and right execution plan:

update   t
set     Amount = p.Amount
from    #t t
join    DATA.Positions p
        on  t.Month = p.Month
        and 2 = p.Client
        and t.Type = p.Type

expected behaviour

How can I achieve that only one partitioned table (SCHEMA2.Positions) is queried?

Thanks in advance and best regards,
Andreas

Edit #1:

Thanks for your fast answer, Martin Smith. To clarifiy the question I will add some primary keys fields (like our production environment have) and some more data.

New Partitioned Tables with more primary key fields:

CREATE TABLE SCHEMA1.Positions (
    Month [smallint] NOT NULL,
    Client [tinyint] NOT NULL,
    SNr int not null,
    RNr int not null,
    VNr int not null,
    PNr int not null,
    Type [char](3) NOT NULL,
    Amount [int] NOT NULL,
 CONSTRAINT [PK_Positions] PRIMARY KEY CLUSTERED 
(
    Month ASC,
    Client ASC,
    SNr ASC,
    RNr ASC,
    VNr ASC,
    PNr ASC,
    Type ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
) on psTP3DATA(Month)
ALTER TABLE SCHEMA1.Positions  WITH CHECK ADD  CONSTRAINT [CK_Positions_Client] CHECK  ((Client=(1)))
GO
ALTER TABLE SCHEMA1.Positions CHECK CONSTRAINT [CK_Positions_Client]
GO

CREATE TABLE SCHEMA2.Positions (
    Month [smallint] NOT NULL,
    Client [tinyint] NOT NULL,
    SNr int not null,
    RNr int not null,
    VNr int not null,
    PNr int not null,
    Type [char](3) NOT NULL,
    Amount [int] NOT NULL,
 CONSTRAINT [PK_Positions] PRIMARY KEY CLUSTERED 
(
    Month ASC,
    Client ASC,
    SNr ASC,
    RNr ASC,
    VNr ASC,
    PNr ASC,
    Type ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
) on psTP3DATA(Month)
ALTER TABLE SCHEMA2.Positions  WITH CHECK ADD  CONSTRAINT [CK_Positions_Client] CHECK  ((Client=(2)))
GO
ALTER TABLE SCHEMA2.Positions CHECK CONSTRAINT [CK_Positions_Client]
GO

New sample data:
SCHEMA1.Positions: 78.105.993
SCHEMA2.Positions: 5.951.206

New query:

create table #t (
    Month smallint,
    Client tinyint,
    SNr int,
    RNr int,
    VNr int,
    PNr int,
    Amount007 int
)

insert  #t (Month, Client, SNr, RNr, VNr, PNr)
select  top 750000 Month, Client, SNr, RNr, VNr, PNr
from    schema2.Positions

update  t
set     Amount007 = p.Amount
from    #t t
join    DATA.Positions p
        on  t.Month = p.Month
        and t.Client = p.Client
        and t.SNr = p.SNr
        and t.RNr = p.RNr
        and t.VNr = p.VNr
        and t.PNr = p.PNr
where   p.Type = '007'

new query plan

new query plan

details clustered index scan SCHEMA1

index scan SCHEMA1

details clustered index scan SCHEMA2

index scan SCHEMA2

Best Answer

How can I achieve that only one partitioned table (SCHEMA2.Positions) is queried?

You already have (in the initial plan you showed).

The cached execution plan needs to be able to deal with any contents in #t but at runtime it will only access the needed table(s).

The filters both have start up predicates and won't run the subtree underneath unless this is true. For your example data the number of executions of the seek against SCHEMA1.Positions is zero.

For the revised plan this is using a hash join so the ability to selectivity apply a filter on a row by row basis is lost.

You can use

DECLARE @Client INT = (SELECT MAX(Client)
   FROM   #t
   HAVING COUNT(DISTINCT Client) = 1)

UPDATE t
SET    Amount = p.Amount
FROM   #t t
       INNER JOIN DATA.Positions p
               ON t.Month = p.Month
                  AND t.Client = p.Client
                  AND t.Type = p.Type
                  AND ( t.Client = @Client
                         OR @Client IS NULL )
OPTION (RECOMPILE);

To provide information to the optimiser in the event that the temp table contains only rows for a specific client and one branch in the view can be ignored.

Related Solutions

Sql-server – Parent-Child Tree Hierarchical ORDER

OK, enough brain cells are dead.

SQL Fiddle

WITH cte AS
(
  SELECT 
    [ICFilterID], 
    [ParentID],
    [FilterDesc],
    [Active],
    CAST(0 AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters]
  WHERE [ParentID] = 0
  UNION ALL
  SELECT 
    i.[ICFilterID], 
    i.[ParentID],
    i.[FilterDesc],
    i.[Active],  
    Level + CAST(i.[ICFilterID] AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters] i
  INNER JOIN cte c
    ON c.[ICFilterID] = i.[ParentID]
)

SELECT 
  [ICFilterID], 
  [ParentID],
  [FilterDesc],
  [Active]
FROM cte
ORDER BY [Level];

Sql-server – Database design for handling 1 billion rows and counting

5000 inserts per minute are about 83 inserts per second. With 5 indexes that's 400 physical rows inserted per second. If the workload was in-memory this would not pose a problem even to the smallest of servers. Even if this was a row-by-row insert using the most inefficient way I can think of. 83 trivial queries per second are just not interesting from a CPU standpoint.

Probably, you are disk-bound. You can verify this by looking at wait stats or STATISTICS IO.

Your queries probably touch a lot of different pages so that the buffer pool does not have space for all of them. This causes frequent page reads and probably random disk writes as well.

Imagine a table where you only physically insert at the end because of an ever-increasing key. The working set would be one page: the last one. This would generate sequential IO as well wen the lazy writer or checkpoint process writes the "end" of the table to disk.

Imagine a table with randomly-placed inserts (classic example: a guid key). Here, all pages are the working set because a random page will be touched for each insert. IOs are random. This is the worst case when it comes to working set.

You're in the middle. Your indexes are of the structure (SomeValue, SequentialDateTime). The first component partially randomizes the sequentiality provided by the second. I guess there are quite a few possible values for "SomeValue" so that you have many randomly-placed insert-points in your indexes.

You say that data is split into 10GB tables per week. That's a good starting point because the working set is now bounded by 10GB (disregarding any reads you might do). With 12GB of server memory it is unlikely, though, that all relevant pages can stay in memory.

If you could reduce the size of the weekly "partitions" or increase server memory by a bit you are probably fine.

I'd expect that inserts at the beginning of the week are faster then at the end. You can test this theory on a dev server by running a benchmark with a certain data size and gradually reducing server memory until you see performance tank.

Now even if all reads and writes fit into memory you might still have random dirty page flushing IO. The only way to get rid of that is to write into co-located positions in your indexes. If you can at all convert your indexes to use (more) sequential keys that would help a lot.

As a quick solution I'd add a buffering layer between the clients and the main table. Maybe accumulate 15min of writes into a staging table and periodically flush it. That takes away the load spikes and uses a more efficient plan to write to the big table.

Best Answer

Related Solutions

Sql-server – Parent-Child Tree Hierarchical ORDER

Sql-server – Database design for handling 1 billion rows and counting

Related Question