Sql-server – Indexing strategy for a query involving three views, multiple columns and several JOINs

indexoptimizationsql server

The scenario

I am having issues with a SELECT query on a view named V3, which takes ages (I had to cancel in SSMS, it timeouts in the application). V3 combines two other views, V1 and V2, to get a one-row result in the end (WHERE clause with idx value).

V3 is defined the following way:

SELECT Live_DataIdx, DTLive1, DTLive2, DTLive3, DTLive4, DTDelivery, DTData2, DTLive5, DTData3, DTData4, Desc, Counter, DataFormat, DataType, Live_QueueIdx, Live_QueueGUID
FROM V1

UNION

SELECT History_DataIdx, DTLive1, DTLive2, DTLive3, DTLive4, DTDelivery, DTData2, DTLive5, DTData3, DTData4, Desc, Counter, DataFormat, DataType, live_QueueIdx, Live_QueueGUID
FROM V2

V1 definition:

SELECT DISTINCT TOP (100) PERCENT dbo.Live_Data.Live_DataIdx,dbo.Live_Queue.DTLive1,dbo.Live_Queue.DTLive2, dbo.Live_Queue.DTLive3,dbo.Live_Queue.DTLive4, dbo.Live_Data.DTData1 AS DTDelivery,dbo.Live_Data.DTData2, dbo.Live_Queue.DTLive5,dbo.Live_Data.DTData3,dbo.Live_Data.DTData4, dbo.MsgType.Desc, dbo.Live_Data.Counter, dbo.Live_Queue.DataFormat, dbo.Live_Data.DataType, dbo.Live_Queue.Live_QueueIdx, dbo.Live_Queue.Live_QueueGUID
FROM dbo.Data 
INNER JOIN dbo.Live_Data ON dbo.Data.DataGUID = dbo.Live_Data.DataGUID 
INNER JOIN dbo.MsgType ON dbo.Live_Data.MsgTypeGUID = dbo.MsgType.MsgTypeGUID 
LEFT OUTER JOIN dbo.Live_Queue ON dbo.Live_Data.Live_QueueGUID = dbo.Live_Queue.Live_QueueGUID
ORDER BY dbo.Live_Data.Live_DataIdx DESC

V2 is declared as follows:

SELECT DISTINCT TOP (100) PERCENT dbo.History_Data.History_DataIdx, dbo.History_Queue.DTLive1, dbo.History_Queue.DTLive2, dbo.History_Queue.DTLive3, 
                  dbo.History_Queue.DTLive4, dbo.History_Data.DTData1 AS DTDelivery, dbo.History_Data.DTData2, dbo.History_Queue.DTLive5, 
                  dbo.History_Data.DTData3, dbo.History_Data.DTData4, dbo.MsgType.Desc, dbo.History_Data.Counter, dbo.History_Queue.DataFormat, dbo.History_Data.DataType, 
                  dbo.History_Queue.History_QueueIdx AS live_QueueIdx, dbo.History_Queue.Live_QueueGUID
FROM dbo.Data 
INNER JOIN dbo.History_Data ON dbo.Data.DataGUID = dbo.History_Data.DataGUID 
INNER JOIN dbo.MsgType ON dbo.History_Data.MsgTypeGUID = dbo.MsgType.MsgTypeGUID 
LEFT OUTER JOIN dbo.History_Queue ON dbo.History_Data.Live_QueueGUID = dbo.History_Queue.Live_QueueGUID
ORDER BY dbo.History_Data.History_DataIdx DESC

I am calling view V3 via Linq2SQL (but I did also try in SSMS) with the following query:

SELECT [Desc],Counter,DataType,DataFormat,DTLive1,DTData3,DTData4,
       DTDelivery,DTData2,DTLive2,DTLive3,DTLive5,DTLive4
FROM V3
WHERE Live_QueueIdx = 4325324

Table definitions

Data:

CREATE TABLE [dbo].[Data](
[DataIdx] [int] IDENTITY(1,1) NOT NULL,
[DataGUID] [uniqueidentifier] ROWGUIDCOL  NOT NULL CONSTRAINT [DF_Data_DataGUID]  DEFAULT (newid()),
[ColDataX1] [varchar](255) NULL,
[ColDataX2] [int] NOT NULL,
[ColDataX3] [int] NOT NULL,
[ColDataX4] [varchar](255) NULL,
[ColDataX5] [uniqueidentifier] NULL,
[ColDataX6] [uniqueidentifier] NULL,    
[ColDataX7] [int] NOT NULL,
CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED 
(
  [DataGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =   
 OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Live_Data:

CREATE TABLE [dbo].[Live_Data](
[Live_DataIdx] [int] IDENTITY(1,1) NOT NULL,
[ColLiveDataX1] [uniqueidentifier] ROWGUIDCOL  NOT NULL CONSTRAINT  
    [DF_Live_Data_Live_DataGUID]  DEFAULT (newid()),
[Live_QueueGUID] [uniqueidentifier] NOT NULL,
[DataGUID] [uniqueidentifier] NOT NULL,
[MsgTypeGUID] [uniqueidentifier] NOT NULL,
[ColLiveDataX2] [int] NOT NULL,
[ColLiveDataX3] [int] NOT NULL,
[ColLiveDataX4] [int] NOT NULL,
[DataType] [int] NOT NULL CONSTRAINT [DF_Live_Data_DataType]  DEFAULT 
        ((0)),
[ColLiveDataX5] [varchar](255) NULL,
[ColLiveDataX6] [datetime] NULL,
[ColLiveDataX7] [uniqueidentifier] NULL,
[Counter] [int] NULL,
[DTData3] [datetime] NULL,
[DTData2] [datetime] NULL,
[ColLiveDataX8] [datetime] NULL,
[DTData4] [datetime] NULL,
[DTData1] [datetime] NULL,
CONSTRAINT [PK_Live_Data] PRIMARY KEY CLUSTERED 
(
    [Live_DataGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = 
 OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

MsgType:

CREATE TABLE [dbo].[MsgType](
[ColMsgTypeX1] [int] IDENTITY(1,1) NOT NULL,
[MsgTypeGUID] [uniqueidentifier] ROWGUIDCOL  NOT NULL CONSTRAINT   
 [DF_MsgType_MsgTypeGUID]  DEFAULT (newid()),
[Desc] [varchar](255) NULL, 
[ColMsgTypeX2] [varchar](255) NULL,
[ColMsgTypeX3] [int] NULL,
[ColMsgTypeX4] [uniqueidentifier] NULL,
[ColMsgTypeX5] [varchar](max) NULL,
[ColMsgTypeX6] [int] NULL,
[ColMsgTypeX7] [int] NULL,
CONSTRAINT [PK_MsgType] PRIMARY KEY CLUSTERED 
(
   [MsgTypeGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =  
 OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[MsgType]  WITH NOCHECK ADD  CONSTRAINT    
   [FK_MsgType_Table1] FOREIGN KEY([Table1GUID])
REFERENCES [dbo].[Table1] ([Table1GUID])
NOT FOR REPLICATION 
GO

ALTER TABLE [dbo].[MsgType] NOCHECK CONSTRAINT [FK_MsgType_Table1]

Live_Queue:

CREATE TABLE [dbo].[Live_Queue](
[Live_QueueIdx] [int] IDENTITY(1,1) NOT NULL,
[Live_QueueGUID] [uniqueidentifier] ROWGUIDCOL  NOT NULL CONSTRAINT [DF_Live_Queue_Live_QueueGUID]  DEFAULT (newid()),
[DTLive3] [datetime] NULL,
[DTLive5] [datetime] NULL,
[DataFormat] [int] NULL,
[ColLiveQueueX1] [varchar](44) NULL,
[ColLiveQueueX2] [int] NULL,
[ColLiveQueueX3] [int] NULL,
[DTLive2] [datetime] NULL,
[ColLiveQueueX4] [datetime] NULL,
[ColLiveQueueX5] [int] NULL,
[ColLiveQueueX6] [uniqueidentifier] NULL,
[ColLiveQueueX7] [int] NULL,
[ColLiveQueueX8] [varchar](max) NULL,
[ColLiveQueueX9] [varchar](55) NULL,
[ColLiveQueueX10] [int] NULL,
[ColLiveQueueX11] [int] NULL,
[ColLiveQueueX12] [varchar](10) NULL,
[ColLiveQueueX13] [int] NULL,
[ColLiveQueueX14] [int] NULL,
[ColLiveQueueX15] [int] NULL,
[ColLiveQueueX16] [datetime] NULL,
[ColLiveQueueX17] [int] NULL,
[ColLiveQueueX18] [int] NULL,   
[ColLiveQueueX19] [geography] NULL,
[ColLiveQueueX20] [varchar](255) NULL,
[ColLiveQueueX21] [varchar](30) NULL,
[ColLiveQueueX22] [varchar](650) NULL,
[ColLiveQueueX23] [varchar](300) NULL,
[DTLive1] [datetime] NULL,
[ColLiveQueueX24] [varchar](255) NULL,
[ColLiveQueueX25] [varchar](255) NULL,  
[ColLiveQueueX26] [varchar](255) NULL,
[ColLiveQueueX27] [varchar](255) NULL,
[ColLiveQueueX28] [varchar](max) NULL,
[ColLiveQueueX29] [int] NULL,
[ColLiveQueueX30] [varchar](255) NULL,
[ColLiveQueueX31] [int] NULL,
[ColLiveQueueX32] [varchar](255) NULL,
[ColLiveQueueX33] [uniqueidentifier] NULL,
[ColLiveQueueX34] [varchar](255) NULL,
[ColLiveQueueX35] [varchar](255) NULL,
[ColLiveQueueX36] [varchar](255) NULL,
[ColLiveQueueX37] [varchar](255) NULL,
[ColLiveQueueX38] [int] NULL,
[ColLiveQueueX39] [int] NULL,
[ColLiveQueueX40] [datetime] NULL,
[ColLiveQueueX41] [int] NULL,
[DTLive4] [datetime] NULL,
CONSTRAINT [Live_Queue_PK] PRIMARY KEY CLUSTERED 
(
   [Live_QueueGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = 
 OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Note:

  • The tables History_Data and History_Queue nearly equal to Live_Data and Live_Queue, except that some columns are set as foreign key.

My request

Thus, I am asking for hints about creating some indices (if recommended: how should those be structed) or reconsiderations about the query. I don't think this use-case is of anything unique, so there has to be plans outside for this kind of a query. Another issue is that I am not able to do much experimentation with nonclustered indices, because indices on the history tables can't be created (runs endlessly). I know that the tables are very large, thus requires some space on disk, so NCIs would probably be needed, but I would need a strategy and idea how to actually create the indices on the critical tables.

Note:

  • As far as I could analyse the problem, the queries on the history tables are a central issue. If I do a simple select on the specific columns with a where clause on the history tables, the result is only returned in about 2 minutes on History_Data, while and it took ~2 minutes on History_Queue, with my hard drive being very busy.

I did also look at the database size:

  • CurrentSizeMB: ~ 2000
  • FreeSpaceMB: ~ 400
  • Log: ~ 1500/1500

Considerations

These views are just overwhelming to start digging into the solution of the right index composition. I did already try some nonclustered indices, but I am not sure about it, because the query just don't stop (waiting for more than 10 minutes), my feeling is that there's some other problem. The NCIs also didn't have any effect. Many different factors need to be considered:

  • Multiple columns are requested. Which column set might be responsible for the delay? Only one, many, all but one?
  • Multiple joins do exist
  • Where clause is used on an idx value

The problem lies within V2 that contains the history data (~1.5 million entries in each history table), V1 can be called in an instance. But because V3 is an UNION of both, V3 suffers from the slowmotion effect of V2. The idea behind the V3 is to combine live and history data to get one specific entry with the selected idx value (here: 4325324) in the end.

The V1 and V2 views are nearly equal, the only difference is that V2 is built with the history tables (History_Data and History_Queue) instead of the live tables (Live_Data and Live_Queue), while both have the same structure.

The select query runs like endlessly, I stopped after a period of more than 10 minutes.

Execution plan

The execution plan can be regarded at the following PasteThePlan URI: Estimated Execution Plan

Note: I removed the nonclustered indices. I tried to add to start from the beginning again to get an idea how to begin systematically on this kind of slowmotion issues.

Any help is kindly appreciated, not only indexwise, but conceptually, too. Thank you.

Best Answer

Thanks for all your great suggestions!

Thanks to EzLo's index suggestion and the lack of anyone's concreate answers, I finally felt inspired to give it a try: Learning by doing.

Combining what I had learned in theory with the first overwhelming use-case of the complex view, I created the following two nonclustered indices:

NC_History_Data:

CREATE INDEX NC_History_Data 
ON History_Data (DataGUID, MsgTypeGUID, Live_QueueGUID) 
INCLUDE (History_DataIdx, DTData1, DTData2, DTData3, DTData4, Counter, DataType)

NC_History_Queue:

CREATE INDEX NC_History_Queue 
ON History_Queue (Live_QueueGUID, History_QueueIdx) 
INCLUDE (DTLive1, DTLive2, DTLive3, DTLive4, DTLive5, DataFormat)

Both NCIs represent a composite index on the join and where columns with the select columns having been included. Et voilĂ , the view V3 does get called in 6s now (instead of infinity minutes), I did even try from the application, same effect.

So let's reconsider: I started with a question of a first overwhelming problem, having to extend my question with more and more data, until I finally felt the courage to just do it. Having got this instant query results felt like surreal, I started to query the history tables first and got to see that a nonclustered Index scan was chosen with my newly created index.

The estimated execution plan for the view V3 looks like this now: estimated execution plan with new indices

As seen, the new indices get used in the bottom-right corner.

Thanks again for all your suggestions, I nearly lost hope this will start working fast.