Is a Clustered Index Scan bad for performance

azure-sql-databaseperformancequery-performance

I have a complex query (see below) which performs badly on a large database.
I'm now analyzing the query by removing the joins, look at the actual plan and step by step add the joins back.
I see clustered index scan with high I/O costs and a 100% CPU cost for a sort.
Can this query be more optimized?

WITH currentPeriod AS
    (SELECT ROW_NUMBER() OVER (PARTITION BY tp.[StartDate]
      ORDER BY m.[Created] DESC) AS rowIndex
            , m.[Created], m.[MessageId]
            , tp.[StartDate], tp.[EndDate], tp.[IsCorrection]
       FROM [TimePeriods] tp
       JOIN [Messages] m on m.Id = tp.[MessageId]
     )
    SELECT [Created], [StartDate], [EndDate], [IsCorrection]
      FROM CurrentPeriod cur
     WHERE rowIndex = 1
     OPTION (RECOMPILE)

The plan (using ApexSQL):
Actual execution plan

I already changed the PK index on the Message table from Clustered to Non-clustered and added a Clustered index on Created.

This is the 'original' plan without the Clustered index on Created:
Original plan

Edit
Here's a link to the plan of the above query: https://www.brentozar.com/pastetheplan/?id=Hkw-4TxPB
Here's the plan of the full query: https://www.brentozar.com/pastetheplan/?id=SyYrDplPH

Background (can be skipped)
I'm not a DBA, I'm just the guy that knows most about databases not meaning I know a lot about databases.
The tables are filled using a custom upload application. The files uploaded are XML-files. Each file is a Message and a Message can have multiple TimePeriods. The Message can also have corrected TimePeriods of earlier Messages. I'm only interested in the last updated period. That's why I use ROW_NUMBER OVER and PARTITION BY. The resulting data of the query is correct. That has been verified.

The full query:

WITH currentPeriod AS
    (SELECT ROW_NUMBER() OVER (PARTITION BY ce.[ReferenceCode], tp.[StartDate], e.[Id], ep.[ContractNumber]
        ORDER BY m.[Created] DESC, ep.[ContractNumber] DESC, IIF(fp.[LbTab] = N'010', 1, 0), fp.[DatAanv] DESC) AS rowIndex
        , ce.[ReferenceCode], ce.[Name] as [EntityName]
        , e.[SocialSecurityNumber], e.[EmployeeNumber], e.[Initials], e.[Firstname], e.[Prefix], e.[Surname], e.[BirthDate]
        , CAST((DATEDIFF(DAY, e.[BirthDate], ep.[HireDate]) / 365.24) as FLOAT) [AgeHired], e.[Gender], e.[PhoneNumber], e.[PhoneNumber2], e.[Email]
        , TRIM(CONCAT_WS(' ', a.[Street], COALESCE(a.[Number], COALESCE(a.[NumberString], '')), COALESCE(a.[NumberExtension], ''))) [StreetLine]
        , a.[ZipCode], a.[City], a.[CountryCode]
        , ep.[HireDate], ep.[DepartureDate]
        , f.[AantVerlU], f.[LnSv], f.[AantSV]
        , fp.[IndAvrLkvOudrWn], fp.[IndAvrLkvAgWn], fp.[IndAvrLkvDgBafSb], fp.[IndAvrLkvHpAgWn], fp.[DatAanv]
        , fp.[LbTab], fp.[IndWAO], fp.[IndWW], fp.[IndZW]
        , tp.[StartDate], tp.[EndDate], tp.[IsCorrection]
        FROM [TimePeriods] tp
        JOIN [Messages] m on m.Id = tp.[MessageId]
        JOIN [CorporateEntities] ce on m.[CorporateEntityId] = ce.Id
        JOIN [Financials] f on f.[TimePeriodId] = tp.Id
        JOIN [FinancialPeriods] fp on fp.[FinancialId] = f.Id
        JOIN [EmployementPeriods] ep on ep.Id = f.[EmployementPeriodId]
        JOIN [Employees] e on e.Id = ep.[EmployeeId]  
        JOIN [Addresses] a on a.Id = e.Id
        WHERE NOT EXISTS (SELECT 1 
                            FROM [dbo].[Withdrawals] w 
                            JOIN [dbo].[TimePeriods] tp2 ON tp2.Id = w.[TimePeriodId] 
                            JOIN [dbo].[Messages] m2 ON m2.id = tp2.MessageId AND m2.[CorporateEntityId] = m.[CorporateEntityId]
                            WHERE w.SofiNr is not null AND w.SofiNr = e.[SocialSecurityNumber] AND tp2.StartDate = tp.[StartDate] 
                            AND w.[NumIv] = ep.[ContractNumber] AND m2.[Created] > m.[Created] AND CONVERT(date, m.[Created]) <= CONVERT(date,@messageDate)) 
        AND CONVERT(date, tp.StartDate) >= CONVERT(date, @date26Param) 
        AND CONVERT(date, tp.EndDate) <= CONVERT(date, @endDate) AND CONVERT(date, m.[Created]) <= CONVERT(date, @messageDate)) 

        SELECT *
        FROM CurrentPeriod cur
       WHERE rowIndex = 1 AND [AantVerlU] > 0 AND [LbTab] != N'010'
         AND (SELECT count(1) 
                FROM CurrentPeriod cur2 
                WHERE cur2.[StartDate] > DATEADD(week, -26, cur.[StartDate]) AND cur2.[StartDate] < cur.[StartDate]
                    AND COALESCE(cur2.[SocialSecurityNumber],  LEFT(cur2.[ReferenceCode], 9) + '-' + cur2.[EmployeeNumber]) = COALESCE(cur.[SocialSecurityNumber],  LEFT(cur.[ReferenceCode], 9) + '-' + cur.Employeenumber)
                    AND cur2.[AantVerlU] > 0 AND cur2.[LbTab] != N'010') = 0
         AND StartDate >= @startDate 
    ORDER BY [SocialSecurityNumber], [StartDate];

Best Answer

The optimiser might be having a hard time to optimise; spool the CTE part into a #Table and query that instead.

Question: Clustered Index Scan bad for performance? Well, not necessarily. Means you don't have an index the query can use that is better (more selective) though. Also performs worse for larger tables... but it could be worse - a heap (no order to the data at all).

AND (SELECT 

    count(1) 
                FROM CurrentPeriod cur2 
                WHERE cur2.[StartDate] > DATEADD(week, -26, cur.[StartDate]) AND cur2.[StartDate] < cur.[StartDate]
                    AND COALESCE(cur2.[SocialSecurityNumber],  LEFT(cur2.[ReferenceCode], 9) + '-' + cur2.[EmployeeNumber]) = COALESCE(cur.[SocialSecurityNumber],  LEFT(cur.[ReferenceCode], 9) + '-' + cur.Employeenumber)
                    AND cur2.[AantVerlU] > 0 AND cur2.[LbTab] != N'010') = 0

Your logic here dictates for each row it's got to compute the validity of whether the dates are applicable... You might want to store the result of the join criteria in the #table also

            AND COALESCE(cur2.[SocialSecurityNumber],  LEFT(cur2.[ReferenceCode], 9) + '-' + cur2.[EmployeeNumber]) = COALESCE(cur.[SocialSecurityNumber],  LEFT(cur.[ReferenceCode], 9) + '-' + cur.Employeenumber)

SQL is set orientated and super efficient within big bunches of information; but details on row level basis in massive tables... spells for RBAR hell - row by agonising row!