Sql-server – SQL CTE Expression with Cross Join – Can it be simplified

cteinsertjoin;query-performancesql server

I am looking to simplify the below SQL Query. Currently it's taking 44 mins to run.
The main issue seems to be the WHERE [Versions] = (SELECT max([Versions]) from BPC_CTE) statement.
What I'd like to do is filter out the [d_bpc] table data by the max([Version]), which would give me a smaller data set to work with, then CROSS JOIN it with the [d_bpcyearmonth] data.


With BPC_CTE ([id] ,[bpc_desc] ,[bpc_active_ingredient] ,[bpc_ai_content] ,[bpc_business_nature]
      ,[bpc_brand] ,[bpc_product_category] ,[bpc_final_fp_code] ,[bpc_formulation_type] ,[bpc_min_pack]
      ,[bpc_npl] ,[bpc_conversion_ratio] ,[bpc_segment] ,[bpc_status] ,[bpc_product_type] ,[bpc_launch_year]
      ,[bpc_scaling] ,[bpc_nbc] ,[bpc_sku] ,[bpc_reporting] ,[bpc_last_update] ,[bpc_latest], [Versions])
AS
(
SELECT DISTINCT [idbpc] ,[bpc_desc] ,[bpc_active_ingredient] ,[bpc_ai_content] ,[bpc_business_nature]
      ,[bpc_brand] ,[bpc_product_category] ,[bpc_final_fp_code] ,[bpc_formulation_type] ,[bpc_min_pack]
      ,[bpc_npl] ,[bpc_conversion_ratio] ,[bpc_segment] ,[bpc_status] ,[bpc_product_type] ,[bpc_launch_year]
      ,[bpc_scaling] ,[bpc_nbc] ,[bpc_sku] ,[bpc_reporting] ,[bpc_last_update] ,[bpc_latest], a.[Version]
FROM [dbo].[d_bpc] a
LEFT JOIN [d_bpchilo] b ON b.idhl = a.idbpc
WHERE PATINDEX('%TOTAL%', bpc_reporting) = 0 
      AND bpc_reporting<>'' 
      AND PATINDEX('%_NBC%', idbpc) = 0 
      AND b.idhl IS NULL
)
SELECT * 
FROM BPC_CTE AS BPC_CTE1
CROSS JOIN [dbo].[d_bpcyearmonth]
WHERE [Versions] = (SELECT max([Versions]) from BPC_CTE);

I tried running the base query, which took 14 mins without adding the WHERE [Versions] = (SELECT max([Versions]) from BPC_CTE); part.

The base query resulted in about 4,000 rows of data. Adding the cross join resulted in about 470,000 rows of data.

In addition to producing this query as a SELECT DISTINCT statement, I have now added INSERT functionality. Here it is:


With BPC_CTE ([idhl] ,[hl_desc] ,[hl_active_ingredient] ,[hl_ai_content] ,[hl_business_nature]
      ,[hl_brand] ,[hl_product_category] ,[hl_final_fp_code] ,[hl_formulation_type] ,[hl_min_pack]
      ,[hl_npl] ,[hl_conversion_ratio] ,[hl_segment] ,[hl_status] ,[hl_product_type] ,[hl_launch_year]
      ,[hl_scaling] ,[hl_nbc] ,[hl_sku] ,[hl_reporting] ,[hl_last_update] ,[hl_latest], [Version])
AS
(
SELECT DISTINCT [idbpc] ,[bpc_desc] ,[bpc_active_ingredient] ,[bpc_ai_content] ,[bpc_business_nature]
      ,[bpc_brand] ,[bpc_product_category] ,[bpc_final_fp_code] ,[bpc_formulation_type] ,[bpc_min_pack]
      ,[bpc_npl] ,[bpc_conversion_ratio] ,[bpc_segment] ,[bpc_status] ,[bpc_product_type] ,[bpc_launch_year]
      ,[bpc_scaling] ,[bpc_nbc] ,[bpc_sku] ,[bpc_reporting] ,[bpc_last_update] ,[bpc_latest], a.[Version]
FROM [dbo].[d_bpc] a
LEFT JOIN [d_bpchilo2] b ON b.idhl = a.idbpc
WHERE PATINDEX('%TOTAL%', bpc_reporting) = 0 
      AND bpc_reporting<>'' 
      AND PATINDEX('%_NBC%', idbpc) = 0 
      AND b.idhl IS NULL
)
INSERT INTO [dbo].[d_bpchilo2] ([idhl] ,[hl_desc] ,[hl_active_ingredient] ,[hl_ai_content] ,[hl_business_nature]
      ,[hl_brand] ,[hl_product_category] ,[hl_formulation_type] ,[hl_min_pack] ,[hl_conversion_ratio] ,[hl_segment]
      ,[hl_product_type] ,[hl_launch_year] ,[hl_scaling] ,[hl_nbc] ,[hl_sku] ,[hl_reporting] ,[hl_year_month]
      ,[hl_latest] ,[Version] ,[hl_last_update] ,[hl_modified_by])
SELECT [idhl] ,[hl_desc] ,[hl_active_ingredient] ,[hl_ai_content] ,[hl_business_nature] ,[hl_brand] ,[hl_product_category]
      ,[hl_formulation_type] ,[hl_min_pack] ,[hl_conversion_ratio] ,[hl_segment] ,[hl_product_type] ,[hl_launch_year]
      ,[hl_scaling] ,[hl_nbc] ,[hl_sku] ,[hl_reporting] ,[YearMonth] ,[hl_latest] ,[Version] ,[hl_last_update]
      ,'Auto-Update' as [hl_modified_by]
FROM BPC_CTE AS BPC_CTE1
CROSS JOIN [dbo].[d_bpcyearmonth]
WHERE [Version] = (SELECT max([Version]) from BPC_CTE)
;

Some people asked about the table structure. Here's the d_bpc table:
The table structure is as follows:

USE [DWH]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[d_bpc](
    [idbpc] [nchar](25) NOT NULL,
    [bpc_desc] [nvarchar](120) NULL,
    [bpc_active_ingredient] [nvarchar](120) NULL,
    [bpc_ai_content] [nchar](23) NULL,
    [bpc_business_nature] [nchar](10) NULL,
    [bpc_brand] [nvarchar](80) NULL,
    [bpc_product_category] [nchar](25) NULL,
    [bpc_final_fp_code] [nchar](25) NULL,
    [bpc_formulation_type] [nchar](3) NULL,
    [bpc_min_pack] [nchar](38) NULL,
    [bpc_npl] [nchar](20) NULL,
    [bpc_conversion_ratio] [nchar](6) NULL,
    [bpc_segment] [nchar](3) NULL,
    [bpc_status] [nchar](50) NULL,
    [bpc_product_type] [nchar](3) NULL,
    [bpc_launch_year] [nchar](4) NULL,
    [bpc_scaling] [nchar](3) NULL,
    [bpc_nbc] [nchar](19) NULL,
    [bpc_sku] [nchar](19) NULL,
    [bpc_reporting] [nchar](19) NULL,
    [bpc_last_update] [datetime] NULL,
    [bpc_latest] [int] NULL,
    [Version] [datetime] NULL
) ON [PRIMARY]
GO

Many thanks for your help in advance.
BB.

Best Answer

Try this; I added a DENSE_RANK() and another CTE to get just the latest version of the data (BEFORE the cross-join). Although with such a prolific explosion of data (from 4k to 470K rows), I'd be curious if your wait wasn't on tempdb to make enough space.

Still, this should at least prevent querying the BPC_CTE more than once, so might be faster.

;With BPC_CTE ([id] ,[bpc_desc] ,[bpc_active_ingredient] ,[bpc_ai_content] ,[bpc_business_nature]
          ,[bpc_brand] ,[bpc_product_category] ,[bpc_final_fp_code] ,[bpc_formulation_type] ,[bpc_min_pack]
          ,[bpc_npl] ,[bpc_conversion_ratio] ,[bpc_segment] ,[bpc_status] ,[bpc_product_type] ,[bpc_launch_year]
          ,[bpc_scaling] ,[bpc_nbc] ,[bpc_sku] ,[bpc_reporting] ,[bpc_last_update] ,[bpc_latest], [Versions], [RN])
    AS
    (
    SELECT DISTINCT [idbpc] ,[bpc_desc] ,[bpc_active_ingredient] ,[bpc_ai_content] ,[bpc_business_nature]
          ,[bpc_brand] ,[bpc_product_category] ,[bpc_final_fp_code] ,[bpc_formulation_type] ,[bpc_min_pack]
          ,[bpc_npl] ,[bpc_conversion_ratio] ,[bpc_segment] ,[bpc_status] ,[bpc_product_type] ,[bpc_launch_year]
          ,[bpc_scaling] ,[bpc_nbc] ,[bpc_sku] ,[bpc_reporting] ,[bpc_last_update] ,[bpc_latest], a.[Version]
        , RN = ROW_NUMBER() OVER (PARTITION BY a.[Version] ORDER BY a.[Version] DESC)
    FROM [dbo].[d_bpc] a
    LEFT JOIN [d_bpchilo] b ON b.idhl = a.idbpc
    WHERE PATINDEX('%TOTAL%', bpc_reporting) = 0 
          AND bpc_reporting<>'' 
          AND PATINDEX('%_NBC%', idbpc) = 0 
          AND b.idhl IS NULL
    )
    , CTE_LatestBPC AS  
        (
        SELECT [id] ,[bpc_desc] ,[bpc_active_ingredient] ,[bpc_ai_content] ,[bpc_business_nature]
          ,[bpc_brand] ,[bpc_product_category] ,[bpc_final_fp_code] ,[bpc_formulation_type] ,[bpc_min_pack]
          ,[bpc_npl] ,[bpc_conversion_ratio] ,[bpc_segment] ,[bpc_status] ,[bpc_product_type] ,[bpc_launch_year]
          ,[bpc_scaling] ,[bpc_nbc] ,[bpc_sku] ,[bpc_reporting] ,[bpc_last_update] ,[bpc_latest], [Versions]
         FROM BPC_CTE AS B 
         WHERE B.RN = 1
         )
    SELECT * 
    FROM CTE_LatestBPC AS BPC_CTE1
    CROSS JOIN [dbo].[d_bpcyearmonth]