Sql-server – Is having Estimated number of Rows > Actual Number of Rows an issue to worry about

execution-planindexperformancesql servert-sql

I've a table [order] where a simple select shows cardinal difference between
Estimated Number of Rows and Actual Number of Rows and in a strange way.

Generally Estimated Number of Rows skews as compared to Actual by in this scenario its totally ironic. Moreover "Number Of Execution" is 1.

Query :

SELECT 
ord.paymentmode AS [HowTheyPaid],
ord.id AS [OrderId],
ord.ISBackEndSystemMigrated,
ord.ISDuplicate,
ord.ISMerged
FROM 

DedicatedDentalPlans_Stage.dbo.[order] ord
WHERE 
ISBackEndSystemMigrated=0 
AND ISDuplicate = 1

Table

USE [DedicatedDentalPlans_Stage]
GO

IF  EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[DF_Order_ISBackEnd]') AND type = 'D')
BEGIN
ALTER TABLE [dbo].[Order] DROP CONSTRAINT [DF_Order_ISBackEnd]
END

GO

IF  EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[DF__Order__ISDuplica__382F5661]') AND type = 'D')
BEGIN
ALTER TABLE [dbo].[Order] DROP CONSTRAINT [DF__Order__ISDuplica__382F5661]
END

GO

IF  EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[DF__Order__ISMerged__39237A9A]') AND type = 'D')
BEGIN
ALTER TABLE [dbo].[Order] DROP CONSTRAINT [DF__Order__ISMerged__39237A9A]
END

GO



/****** Object:  Table [dbo].[Order]    Script Date: 03/09/2016 02:11:57 ******/
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Order]') AND type in (N'U'))
DROP TABLE [dbo].[Order]
GO

/****** Object:  Table [dbo].[Order]    Script Date: 03/09/2016 02:11:57 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[Order](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [SubscriberId] [int] NOT NULL,
    [BillingAddressId] [int] NOT NULL,
    [ShippingAddressId] [int] NULL,
    [OrderStatusId] [int] NOT NULL,
    [ShippingStatusId] [int] NOT NULL,
    [PaymentStatusId] [int] NOT NULL,
    [PaymentMethodSystemName] [nvarchar](max) NULL,
    [OrderDiscount] [decimal](18, 4) NOT NULL,
    [OrderTotal] [decimal](18, 4) NOT NULL,
    [SystemUserId] [int] NULL,
    [UserType] [int] NULL,
    [PaymentMode] [varchar](20) NULL,
    [Deleted] [bit] NOT NULL,
    [AutomaticRenewal] [bit] NULL,
    [CreatedOnUtc] [datetime] NOT NULL,
    [ETLControlId] [int] NULL,
    [ETLGenerationId] [int] NULL,
    [ISBackEndSystemMigrated] [bit] NULL,
    [CheckNumber] [varchar](20) NULL,
    [ISDuplicate] [bit] NULL,
    [ISMerged] [bit] NULL,
 CONSTRAINT [PK__Order__3214EC0722AA2996] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[Order] ADD  CONSTRAINT [DF_Order_ISBackEnd]  DEFAULT ((0)) FOR [ISBackEndSystemMigrated]
GO

ALTER TABLE [dbo].[Order] ADD  DEFAULT ((0)) FOR [ISDuplicate]
GO

ALTER TABLE [dbo].[Order] ADD  DEFAULT ((0)) FOR [ISMerged]
GO

Solution Opted so Far with no result:
Updated Statistics related to same index and clustered index with FULL SCAN.

Best Answer

In the particular scenario presented:

No, the difference between estimated and actual number of rows is not important. There are two key pieces of information to support that statement:

The query has qualified for a trivial plan; and
The query has further qualified for simple parameterization

This is clear from the Seek Predicate text, where the literal values for ISBackEndSystemMigrated and ISDuplicate have been replaced with the parameter markers @1 and @2:

This means that the plan selected is the same one SQL Server would choose for all possible literal values. Trivial plan reduces compilation time to a minimum for simple queries; simple parameterization promotes execution plan reuse.

In addition, the apparent cardinality mismatch is due to SQL Server producing an estimate for the average case (over all possible literal values).

In other cases, of course it can matter a great deal that cardinality estimates are inaccurate. The trick is knowing when it is important or not. There is no general safe rule of thumb here, much of it comes down to experience. Query tuning is not all science; there is room for art too :)

Related Solutions

MySQL looking up more rows than needed (indexing issue)

Your indexes are fine for the two types of queries you mentioned.

This query will be satisfied by traversing the clustered index on the primary key...

[...] WHERE participant_id = x AND question_id = y AND given_answer_id = z;

...and this one is satisfied by the index on 'question_id':

[...] WHERE question_id = x;

The output of EXPLAIN SELECT is not telling you what you think it is telling you, because the value shown in rows is an estimate of the number of rows the server will need to consider, not the actual rows it will examine. For InnoDB these are based on index statistics.

rows

The rows column indicates the number of rows MySQL believes it must examine to execute the query.

For InnoDB tables, this number is an estimate, and may not always be exact.

^{— http://dev.mysql.com/doc/refman/5.5/en/explain-output.html#explain_rows}

The optimizer gathers information about different possible query plans, and chooses the one with the lowest cost. The information shown in EXPLAIN is the information the optimizer gathered about the plan it selected.

When type is ref and key is not NULL, this means that the name listed in the key column is the name of the index that the optimizer has chosen to use to find the desired rows, so your query plan looks exactly as it should.

Note, sometimes you will see Using index in the Extra column and a lot of people assume that this means an index is being used, or that no index is being used when that doesn't appear, but that's not correct, either. Using index describes a special case called a "covering index" -- it does not indicate whether an index is being used to locate the rows of interest.

It's possible that running ANALYZE [LOCAL] TABLE would cause the numbers in rows shown by EXPLAIN to differ, but this is a simple query and selecting this index is an obvious choice for the optimizer to make, so ANALYZE TABLE is unlikely to make any actual difference in performance.

It is possible, however, that your overall performance might see some marginal improvement with an occasional OPTIMIZE [LOCAL] TABLE, because you are not inserting rows in primary key order (as would be the case with an auto_increment primary key)... but on large tables this can be time-consuming because it rebuilds a new copy of the table... but, again, I wouldn't expect any significant change.

Sql-server – Oracle GoldenGate add trandata errors

I found out what the problem is, it seems that GoldenGate doesn't work with SQL Express. The server I was connecting to is SQL Express, I'll need to use the Enterprise Edition.

Best Answer

Related Solutions

MySQL looking up more rows than needed (indexing issue)

Sql-server – Oracle GoldenGate add trandata errors

Related Question