We have SQL Server 2012 Enterprise Edition running following query via a stored procedure:
declare @TopX int = 1000
declare @stores Table (Store varchar(5), LastDate datetime, LastId int, RangeEnd datetime)
insert into @stores
select *
from (select SourceStore, '2014-01-01' as i, null as ii, '2014-01-08' as iii
from StoreConfig.dbo.Version
group by SourceStore
) t
where (ABS(CAST((BINARY_CHECKSUM(*) * RAND()) as int)) % 100) < 50
IF OBJECT_ID('tempdb..#agreements') IS NOT NULL
DROP TABLE #agreements
IF OBJECT_ID('tempdb..#stores') IS NOT NULL
DROP TABLE #stores
select Store,
isnull(LastDate, '1899-01-01') StartDate,
isnull(LastId, -1) LastId,
isnull(RangeEnd, getdate()) RangeEnd
into #stores
from @stores
update #stores set StartDate = '2015-07-01', RangeEnd='2015-07-08'
-- THIS IS NOT FAST.. :(
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
--grab the @TopX agreements from primary customers
--note: only grabbing the columns in our index to prevent RID lookups
--on every agreement before we sort and take a relatively tiny subset
select top 1000
a.SourceStore
,a.AgreementId
,isnull(a.ModifiedDate, a.CreatedDate) as ModifiedDate
,ca.CustomerId
from #stores s
inner join StoreOps.POSREPL3Agreement.Agreement a on a.SourceStore = s.Store
inner join StoreOps.Customer.CustomerAgreement ca on ca.SourceStore = a.SourceStore
and ca.AgreementId = a.AgreementId
and ca.IsPrimary = 1
where ( (a.ModifiedDate between s.StartDate and s.RangeEnd)
or ( a.ModifiedDate is null
and a.CreatedDate between s.StartDate and s.RangeEnd)
)
and ( isnull(a.ModifiedDate, a.CreatedDate) > s.StartDate
or a.AgreementId > s.LastId
)
order by isnull(a.ModifiedDate, a.CreatedDate), a.AgreementId
We have following indexes for [Customer].[CustomerAgreement]
table:
CREATE NONCLUSTERED INDEX [IX_CustomerAgreement_SourceStore_AgreementId]
ON [Customer].[CustomerAgreement]
([SourceStore] ASC, [AgreementId] ASC, [IsPrimary] ASC)
INCLUDE ([CustomerId])
ON [PRIMARY]
GO
And this is index for [POSREPL3Agreement].[Agreement]
table:
CREATE NONCLUSTERED INDEX [IX_Agreement_SourceStore_ModifiedDate]
ON [POSREPL3Agreement].[Agreement]
([SourceStore] ASC, [ModifiedDate] ASC, [CreatedDate] ASC)
INCLUDE ([AgreementId])
ON [PRIMARY]
GO
If we remove the WHERE
clause index works as expected and we see 1000 records from both the tables, but when we add the listed WHERE
clause [Customer].[CustomerAgreement]
estimates all the records instead of 1000.
How can we improve the WHERE
clause or index to get [Agreement]
table line up with [CustomerAgreement]
table so that Estimated Row under [CustomerAgreement]
is not ALL the records?
Table definitions
CREATE TABLE [Customer].[CustomerAgreement](
[CustomerAgreementId] [int] NOT NULL,
[CustomerId] [int] NOT NULL,
[AgreementId] [int] NOT NULL,
[IsPrimary] [bit] NOT NULL,
[Store] [varchar](5) NOT NULL,
[SourceStore] [varchar](5) NOT NULL,
[RowGUID] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[Repl_ID] [tinyint] NOT NULL,
CONSTRAINT [PK_Customer_CustomerAgreementID_SourceStore] PRIMARY KEY NONCLUSTERED
(
[CustomerAgreementId] ASC,
[SourceStore] ASC,
[Repl_ID] ASC,
[RowGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [POSREPL3Agreement].[Agreement](
[AgreementId] [int] NOT NULL,
[QuoteId] [int] NULL,
[AgreementNumber] [varchar](11) NOT NULL,
[AgreementStatusId] [tinyint] NOT NULL,
[AgreementPrinted] [bit] NOT NULL,
[IsNewPOSCreated] [bit] NOT NULL,
[LeaseFrequencyId] [tinyint] NOT NULL,
[DeferredLateFeeAmount] [decimal](8, 2) NOT NULL,
[InHomeVisitFeeAmount] [decimal](8, 2) NOT NULL,
[IsASPTaxable] [bit] NOT NULL,
[ServicePlusRate] [decimal](7, 5) NOT NULL,
[ServicePlusFloor] [decimal](5, 2) NOT NULL,
[TaxRatePercentage] [decimal](7, 5) NOT NULL,
[IgnoreTaxRateChange] [bit] NOT NULL,
[Balance] [decimal](8, 2) NOT NULL,
[AmountPaidToDate] [decimal](10, 2) NOT NULL,
[Deposit] [decimal](8, 2) NOT NULL,
[DeliveryFee] [decimal](8, 2) NOT NULL,
[StartDate] [datetime] NOT NULL,
[DueDay] [int] NOT NULL,
[DueDayTypeId] [tinyint] NULL,
[PaidThroughDate] [datetime] NOT NULL,
[PayOutDate] [datetime] NOT NULL,
[FinalDate] [datetime] NULL,
[IsNSFOutstanding] [bit] NOT NULL,
[LeadSourceId] [tinyint] NOT NULL,
[AgreementTypeId] [tinyint] NOT NULL,
[AcquisitionAgreementTypeId] [tinyint] NULL,
[SameAsCashDuration] [int] NOT NULL,
[SameAsCashDurationType] [int] NOT NULL,
[MinimumPercentageOfCashPriceForFinalPayment] [decimal](3, 2) NOT NULL,
[EarlyPayoutLeaseAmountRate] [decimal](3, 2) NOT NULL,
[EarlyPayout] [decimal](10, 2) NULL,
[FinalPaymentAdditionalFee] [decimal](8, 2) NOT NULL,
[FinalPaymentProrateAmount] [decimal](10, 2) NOT NULL,
[CreditedAssociateId] [int] NULL,
[NonRenewalGracePeriod] [int] NOT NULL,
[LastAgreementTransactionId] [int] NULL,
[CanPayout] [bit] NOT NULL,
[IsServicePlusIncludedInPayout] [bit] NOT NULL,
[IsProrateEnabled] [bit] NOT NULL,
[AgreementDocument] [varbinary](max) NULL,
[CreatedDate] [datetime] NOT NULL,
[CreatedBy] [int] NULL,
[ModifiedDate] [datetime] NULL,
[ModifiedBy] [int] NULL,
[Store] [varchar](5) NOT NULL,
[SourceStore] [varchar](5) NOT NULL,
[RowGUID] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[Repl_ID] [tinyint] NOT NULL,
[ECommerceDeliveredDate] [datetime] NULL,
[DDEStatusId] [tinyint] NULL,
[DDEAmount] [decimal](8, 2) NULL,
[OrderMethodTypeId] [int] NULL,
[SemiMonthlyUpcharge] [decimal](8, 2) NULL,
[DefaultNonRenewalFee] [decimal](8, 2) NULL,
[DefaultInHomeVisitFee] [decimal](8, 2) NULL,
[NonRenewalSemiMonthlyFee] [decimal](8, 2) NULL,
[NonRenewalSemiMonthlyFeeGracePeriod] [int] NULL,
[NonRenewalFeeTypeId] [int] NULL,
[NonRenewalSemiMonthlyRate] [decimal](8, 4) NULL,
[NonRenewalMonthlyRate] [decimal](8, 4) NULL,
[NonRenewalWeeklyFeeGracePeriod] [int] NULL,
[NonRenewalWeeklyRate] [decimal](8, 4) NULL,
[NonRenewalWeeklyFee] [decimal](8, 2) NULL,
[DefaultNSFFee] [decimal](8, 2) NULL,
[WeeklyUpcharge] [decimal](8, 2) NULL,
[IsInHomeFeeEnabled] [bit] NULL,
[CanChargeInHomeFeeAndNonRenewalFeeInSamePeriod] [bit] NULL,
[ExtensionBalance] [decimal](8, 2) NOT NULL,
CONSTRAINT [PK_Agreement_AgreementID_SourceStore] PRIMARY KEY NONCLUSTERED
(
[AgreementId] ASC,
[SourceStore] ASC,
[Repl_ID] ASC,
[RowGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Execution plan
Showplan XML download here (pastebin)
Best Answer
The short answer is that without any filtering of rows in the
WHERE
clause, the query optimizer estimates that it only needs to read 1000 rows from each table to produce the desired result of 1000 rows.You did not provide an execution plan for this query, so I can't say much more than that. It is likely your query also omitted the
ORDER BY
clause in this case, otherwise a Sort would likely have been necessary, which would typically entail reading all rows from its subtree.With the
WHERE
clause, the optimizer expects many more rows to be read (due to the expecting filtering effect) before the 1000th row is available to return to the client. The difference between the estimated and actual number of rows read is due to the problem of estimating the selectivity of the complex predicates from the available statistics. This is true even assuming the statistics are representative of the data. Fundamentally: it's too hard, and the optimizer estimates cardinality incorrectly.1. Indexed computed column solution
You may find the following changes worthwhile:
Index the #stores temporary table:
Add a computed column to the Agreements table. This uses no storage, and is a very fast metadata-only operation:
Create (or modify the existing) index to use the computed column. This will satisfy the
ORDER BY
clause.Simplify the query to reference the computed column directly:
If you are unable to rewrite the query as shown, the computed column index will need to include two extra column for technical reasons:
The expected execution plan will still show inaccurate estimates (because the optimizer is over-optimistic about how quickly it can stop scanning the ordered computed column index) but the (Top N and Distinct) Sorts are eliminated, and ought to still perform better:
2. Comprehensive indexed view solution
If you unable to add the computed column and index, you could investigate using an indexed view instead:
The query then becomes:
3. Simpler indexed view solution
It is also possible to more directly reflect the computed column solution in an indexed view, though this idea does not eliminate the join:
This time the query becomes:
I had to guess at which columns would be unique in the view. If you implement either solution, take your wider workload into account, and add unique columns from the base tables if necessary to make the view clustered index unique. The
[RowGUID]
columns look favourite for this.You should test carefully to assess the impact of the indexed view on data changes to the base tables, and the amount of storage required.
With any of these solutions, performance may not be as good as expected if the temporary table contains very early date ranges, or if there is a store that doesn't have a recent created or modified date. You should test it and see how it goes with your real data and requirements.
You should remove the statement that sets the isolation level to
READ UNCOMMITTED
if that is only there in a desperate attempt to increase performance. Please see my article for details.Also, I notice that both tables are currently heaps. You should be aware that most tables benefit from having a clustered index, for space management reasons if nothing else. If the heaps experience deletions, you may need to rebuild them from time to time to reclaim empty pages that have done been automatically deallocated.
By the way, the
RAND
in your @stores query doesn't have the effect you probably think. It generates the same value for every row (a runtime constant).