Sql-server – Improving the query standards and performance run time

execution-planperformancequeryquery-performancesql serversql-server-2012

For one of our Vendor based SQL server database, they have created certain views for users to query. Since i am new to SQL querying, please help me understand the below format of SQL and how can i tune it up further as it runs over hours. I will try to get a repro, but for now, let me know if below can help:-

Query

SELECT TABLE1.ID   ABC
       TABLE1.COL1 XYZ
       TABLE1.COL2 ASD
       TABLE1.COL3
       TABLE1.COL4
FROM 
       TABLE1,TABLE2,TABLE3

WHERE TABLE2.ID=TABLE1.ID
AND   TABLE2.COLUMN1='Data'
AND   TABLE3.ID=TABLE1.ID
AND   TABLE3.COLUMN2='Value'
AND   TABLE1.COL1=CAST('20190914' as DATE)
AND   DATEPART(hour,TABLE1.COL1) BETWEEEN 1 AND 2

For Table1 there is no CI, we just have Unique NON CI as PK

TABLE 2 and TABLE 3 also does not have any CI, they both have Unique non CI on both ID and datetime column
TABLE1.COL1 is a datetime column

Above query is run multiple times in 12 batches a day for different date range in between clause to collect data between 24 hours, like 0-1, 1-2, 2-3 , 3-4 …. 21-22,23-24

Is there a better way to perform above run?

Thanks

Best Answer

Clustered Index is must.Unless we do not have table structure and their description we cannot say which column should be CI.

Provide volume of data in each table and expected volume of output.

DATEPART(hour,TABLE1.COL1) BETWEEEN 1 AND 2 is not SARGAble.

So you can rewrite the query as below.Always check the performance with parameter instead of HARD coded value.

I mean that in real life scenario, we use local variable or Proc's parameter.So test query performance using using same. Check that Parameter Sniffing problem is tolerable or not and how it can be minimized.If Parameter Sniffing is tolerable then ignore it. Option Recompile should be rarely use .

declare @FromDate Datetime=cast(getdate() as date)
declare @FromDatetime=dateadd(hour,1,@FromDate)
declare @ToDatetime=dateadd(hour,2,@FromDate)

SELECT TABLE1.ID   ABC
       TABLE1.COL1 XYZ
       TABLE1.COL2 ASD
       TABLE1.COL3
       TABLE1.COL4
FROM 
       TABLE1 T1
     inner join  TABLE2 T2 on T2.ID=T1.ID
     inner join  TABLE3 T3 on  T3.ID=T1.ID -- try once T3.ID=T2.ID
WHERE 
   T2.COLUMN1='Data'
AND   T3.COLUMN2='Value'
AND   T1.COL1>=@FromDatetime
AND   T1.COL1<=@ToDatetime

Alternatively,

It appear that you don't need Table2 and Table3 column in your result set,

So you can rewrite in this manner,

SELECT TABLE1.ID   ABC
       TABLE1.COL1 XYZ
       TABLE1.COL2 ASD
       TABLE1.COL3
       TABLE1.COL4
FROM dbo.TABLE1 T1   
WHERE 
exists(select 1 from dbo.TABLE2 T2 
     inner join  dbo.TABLE3 T3 on T3.ID=T2.ID
     where T3.ID=T1.ID and T2.COLUMN1='Data' AND   T3.COLUMN2='Value')
AND   T1.COL1>=@FromDatetime
AND   T1.COL1<=@ToDatetime

Related Solutions

Sql-server – How to create implicit select statements

Not sure that this is simpler than a cursor solution, but here is what I would do:

Before we begin we need a few tables in a SQL Fiddle:

MS SQL Server 2012 Schema Setup

CREATE TABLE dbo.tbl1(c1 INT, c2 INT, c3 INT);
CREATE TABLE dbo.tbl2(c4 INT, c5 INT, c6 INT);
CREATE TABLE dbo.tbl3(c7 INT, c8 INT, c9 INT);

First we need the list of all user tables. For that we can use the sys.tables catalog view. The OBJECT_SCHEMA_NAME() function gets us the schema name, the OBJECT_NAME() function the table name. The QUOTENAME() function quotes the names correctly, in case some of those contain special characters, key words or spaces. (In this example the use of OBJECT_NAME() is not strictly necessary as sys.tables has a name column, but I left it in as you can use this pattern with any catalog view that has an object_id column.)

Query 1:

SELECT 
  QUOTENAME(OBJECT_SCHEMA_NAME(T.object_id))+'.'+
  QUOTENAME(OBJECT_NAME(T.object_id)) AS quoted_table_name,
  T.object_id
FROM sys.tables AS T;

Results:

| QUOTED_TABLE_NAME | OBJECT_ID |
|-------------------|-----------|
|      [dbo].[tbl1] | 245575913 |
|      [dbo].[tbl2] | 261575970 |
|      [dbo].[tbl3] | 277576027 |

The next step is to get the list of column names, again quoted. We can use the sys.columns catalog view for that.

Query 2:

SELECT C.name,C.column_id 
  FROM sys.columns AS C
 WHERE C.object_id = OBJECT_ID('dbo.tbl1');

Results:

| NAME | COLUMN_ID |
|------|-----------|
|   c1 |         1 |
|   c2 |         2 |
|   c3 |         3 |

The next hurdle is to get those columns in a comma separated list. There is no string concatenation aggregate function build in so we have to use a trick:

Query 3:

SELECT STUFF((
  SELECT ','+QUOTENAME(name)
    FROM sys.columns AS C
   WHERE C.object_id = OBJECT_ID('dbo.tbl1')
   ORDER BY C.column_id
     FOR XML PATH(''),TYPE
  ).value('.','NVARCHAR(MAX)'),1,1,'') AS clomun_list;

Results:

|    CLOMUN_LIST |
|----------------|
| [c1],[c2],[c3] |

With that all pieces are in place and we just have to put them all together:

Query 4:

SELECT 'SELECT ' +
       CL.column_list +
       ' FROM ' +
       QUOTENAME(OBJECT_SCHEMA_NAME(T.object_id)) + '.' +
       QUOTENAME(OBJECT_NAME(T.object_id)) +
       ';' AS select_statement
FROM sys.tables AS T
CROSS APPLY (
  SELECT STUFF((
    SELECT ','+QUOTENAME(name)
      FROM sys.columns AS C
     WHERE C.object_id = T.object_id
     ORDER BY C.column_id
       FOR XML PATH(''),TYPE
    ).value('.','NVARCHAR(MAX)'),1,1,'') AS column_list
  )CL;

Results:

|                         SELECT_STATEMENT |
|------------------------------------------|
| SELECT [c1],[c2],[c3] FROM [dbo].[tbl1]; |
| SELECT [c4],[c5],[c6] FROM [dbo].[tbl2]; |
| SELECT [c7],[c8],[c9] FROM [dbo].[tbl3]; |

This example will work in SQL 2005 and later, assuming you are on the latest service pack. There is no clean solution to achieve this with SQL Server 2000. In that case you need to go back to your cursor solution.

SQL Server Index – How Index Works with Two Columns in WHERE Clause

I want to know how the query execution works here

The general execution model is a pipeline, where each iterator returns a row at a time. Execution starts at the root iterator (on the far left, labelled SELECT in your example).

After initialization, the root iterator requests a row from its immediate child, and so on down the chain until an iterator that can return a row is found. This passes back up the chain to the root where it is queued for despatch to the client. That is a very simplified overview, for more details see:

Showplan Logical and Physical Operators Reference (Product Documentation)
Iterators, Query Plans, and Why They Run Backwards (me)
A Second in the Life of a Query Operator by Brad Schulz

Is it going to fetch all the rows that matches the 'col2=val2' condition and check for the other condition?

The nonclustered index seek will locate a row that matches col2=val2. It will be able to return col2 and col1 (see its output list) because col1 is present in the index (since the primary key is clustered in this case).

This row is passed up to the nested loops join, which then passes control to the key lookup. The lookup uses the col1 value to seek into the clustered index b-tree to find the value of col3 in this row. The value is tested against the predicate col3=val3 and only returned if it matches.

If there is a match, the row (c1, c2, c3) is passed up the chain and queued for transmission to the client. As control descends the tree again, any new match for col2 in the nonclustered index will result in a repeat of the nested loops join -> lookup -> return row cycle. As soon as the nonclustered index seek runs out of rows, the process completes when control next returns to the root iterator.

Why these two -Index seek and Key-Lookup, are shown parallel in the execution plan?

That's just the way the graphical plan is laid out. See the links and discussion above for the correct way to understand the execution process in detail.

Will it always use the available index on 'col2', considering a large dataset and assuming almost all entries in 'col2' are unique?

Most likely yes. The optimizer makes a cost-based choice between the available strategies. With very few matches expected, the optimizer will usually assess a nonclustered index seek with a lookup as being the cheapest option. An exception occurs when the table is very small. In that case, scanning the clustered index and applying both predicates to each row will likely look cheapest.

As a final note, a covering nonclustered index that would avoid the lookup is:

CREATE [UNIQUE] NONCLUSTERED INDEX [dbo.table1 col2,col3 (col1)]
ON dbo.table1 (col2, col3)
INCLUDE (col1);

It should be specified UNIQUE if that is true for (col2, col3).

Best Answer

Related Solutions

Sql-server – How to create implicit select statements

SQL Server Index – How Index Works with Two Columns in WHERE Clause

Related Question