Sql-server – Help with optimising a sql query on two partitioned tables

join;optimizationpartitioningsql server

I have a SQL query that I am running on two very large tables in SQL Server 2012 Enterprise.

The two tables are ACKs and Logs. Both are partitioned on their time by day column with the appropriate indexes.

The tables are as follows:

Ack Table

and

Log Table

Here is the query:

DECLARE @day as varchar(max) = N'20141007'
DECLARE @dayStart as varchar(MAX) = CONCAT(@day,N' 00:00:00.000')
DECLARE @dayEnd as varchar(MAX) = CONCAT(@day,N' 23:59:59.999')

SELECT 
    messageId, time, direction, hasRouting, deviceType, 
    unitId, accountCode, clientId, data
FROM 
    Logs AS A
WHERE 
    EXISTS (SELECT * FROM Acks AS B 
            WHERE (A.messageId = B.messageId) 
              AND (B.direction = 0) AND (B.isNack = 0)) 
    AND (A.unitId LIKE 'TEST') 
    AND (A.direction = 1) AND (A.time BETWEEN @dayStart AND @dayEnd) 
ORDER BY 
    time ASC 

I must admit that I have not had a lot of experience with queries from two tables i.e. joins. The way I interpret this query is that it is going to first filter out all of the entries for the log table, and then go and do a "where" query on the entire log table for each row in the log table.

Now, each table is about 50GB and you can imagine if it's not utilising the partitioning scheme correctly this query is going to iterate through the entire table.

My concerns with the query are:

  • Performance. The query is taking way too long to execute
  • It is not fully utilising the partition function when searching

I have noticed that it looks in every single file group when executing the query, even when the query only spans 1 day. I'm not sure if this is due to the Exists.

Some advice and tips on how to structure this query would be greatly appreciated. And possibly also explain what is wrong with my current one so I can learn from my mistakes.

{EDIT}

Reworked version:

DECLARE @day as datetime2(2) = N'20141007'
DECLARE @dayStart as datetime2(2) = CONCAT(@day,N' 00:00:00.000')
DECLARE @dayEnd as datetime2(2) = CONCAT(@day,N' 23:59:59.999')

SELECT 
   messageId, time, direction, hasRouting, deviceType, 
   unitId, accountCode, clientId, data
FROM 
   Logs 
INNER JOIN 
   Acks ON Logs.messageId = Acks.messageId
WHERE 
   (Acks.direction = 0) 
   AND (Acks.isNack = 0)
   AND (Logs.unitId LIKE 'DEVMB1') 
   AND (Logs.direction = 1) 
   AND (Logs.time BETWEEN @dayStart AND @dayEnd) 
   AND (Acks.time BETWEEN @dayStart AND @dayEnd) 
ORDER BY 
   Logs.time ASC 

Added execution plan for reworked query:

Execution Plan

Best Answer

Without query execution plan, my first thoughts are:

I. Use datetime variables to remove implicit convertion and use index.

DECLARE @day as datetime2(2) = '20141007'
DECLARE @dayStart as datetime2(2) = CONCAT(@day,' 00:00:00.000')
DECLARE @dayEnd as datetime2(2) = CONCAT(@day,' 23:59:59.999')

II. Change EXISTS for INNER JOIN

EXISTS (Select * FROM Acks AS B where (A.messageId = B.messageId)

to

INNER JOIN ON A.messageId = B.messageId

III. Consider limiting table B on date - at the moment it looks through whole table.

UPDATE. 1. Do parts of your join work much faster as single queries? 2. Please include both Execution Plans on your post. Links may stop working in future.