I am looking to simplify the below SQL Query. Currently it's taking 44 mins to run.
The main issue seems to be the WHERE [Versions] = (SELECT max([Versions]) from BPC_CTE)
statement.
What I'd like to do is filter out the [d_bpc]
table data by the max([Version])
, which would give me a smaller data set to work with, then CROSS JOIN
it with the [d_bpcyearmonth]
data.
With BPC_CTE ([id] ,[bpc_desc] ,[bpc_active_ingredient] ,[bpc_ai_content] ,[bpc_business_nature]
,[bpc_brand] ,[bpc_product_category] ,[bpc_final_fp_code] ,[bpc_formulation_type] ,[bpc_min_pack]
,[bpc_npl] ,[bpc_conversion_ratio] ,[bpc_segment] ,[bpc_status] ,[bpc_product_type] ,[bpc_launch_year]
,[bpc_scaling] ,[bpc_nbc] ,[bpc_sku] ,[bpc_reporting] ,[bpc_last_update] ,[bpc_latest], [Versions])
AS
(
SELECT DISTINCT [idbpc] ,[bpc_desc] ,[bpc_active_ingredient] ,[bpc_ai_content] ,[bpc_business_nature]
,[bpc_brand] ,[bpc_product_category] ,[bpc_final_fp_code] ,[bpc_formulation_type] ,[bpc_min_pack]
,[bpc_npl] ,[bpc_conversion_ratio] ,[bpc_segment] ,[bpc_status] ,[bpc_product_type] ,[bpc_launch_year]
,[bpc_scaling] ,[bpc_nbc] ,[bpc_sku] ,[bpc_reporting] ,[bpc_last_update] ,[bpc_latest], a.[Version]
FROM [dbo].[d_bpc] a
LEFT JOIN [d_bpchilo] b ON b.idhl = a.idbpc
WHERE PATINDEX('%TOTAL%', bpc_reporting) = 0
AND bpc_reporting<>''
AND PATINDEX('%_NBC%', idbpc) = 0
AND b.idhl IS NULL
)
SELECT *
FROM BPC_CTE AS BPC_CTE1
CROSS JOIN [dbo].[d_bpcyearmonth]
WHERE [Versions] = (SELECT max([Versions]) from BPC_CTE);
I tried running the base query, which took 14 mins without adding the WHERE [Versions] = (SELECT max([Versions]) from BPC_CTE);
part.
The base query resulted in about 4,000 rows of data. Adding the cross join resulted in about 470,000 rows of data.
In addition to producing this query as a SELECT DISTINCT
statement, I have now added INSERT
functionality. Here it is:
With BPC_CTE ([idhl] ,[hl_desc] ,[hl_active_ingredient] ,[hl_ai_content] ,[hl_business_nature]
,[hl_brand] ,[hl_product_category] ,[hl_final_fp_code] ,[hl_formulation_type] ,[hl_min_pack]
,[hl_npl] ,[hl_conversion_ratio] ,[hl_segment] ,[hl_status] ,[hl_product_type] ,[hl_launch_year]
,[hl_scaling] ,[hl_nbc] ,[hl_sku] ,[hl_reporting] ,[hl_last_update] ,[hl_latest], [Version])
AS
(
SELECT DISTINCT [idbpc] ,[bpc_desc] ,[bpc_active_ingredient] ,[bpc_ai_content] ,[bpc_business_nature]
,[bpc_brand] ,[bpc_product_category] ,[bpc_final_fp_code] ,[bpc_formulation_type] ,[bpc_min_pack]
,[bpc_npl] ,[bpc_conversion_ratio] ,[bpc_segment] ,[bpc_status] ,[bpc_product_type] ,[bpc_launch_year]
,[bpc_scaling] ,[bpc_nbc] ,[bpc_sku] ,[bpc_reporting] ,[bpc_last_update] ,[bpc_latest], a.[Version]
FROM [dbo].[d_bpc] a
LEFT JOIN [d_bpchilo2] b ON b.idhl = a.idbpc
WHERE PATINDEX('%TOTAL%', bpc_reporting) = 0
AND bpc_reporting<>''
AND PATINDEX('%_NBC%', idbpc) = 0
AND b.idhl IS NULL
)
INSERT INTO [dbo].[d_bpchilo2] ([idhl] ,[hl_desc] ,[hl_active_ingredient] ,[hl_ai_content] ,[hl_business_nature]
,[hl_brand] ,[hl_product_category] ,[hl_formulation_type] ,[hl_min_pack] ,[hl_conversion_ratio] ,[hl_segment]
,[hl_product_type] ,[hl_launch_year] ,[hl_scaling] ,[hl_nbc] ,[hl_sku] ,[hl_reporting] ,[hl_year_month]
,[hl_latest] ,[Version] ,[hl_last_update] ,[hl_modified_by])
SELECT [idhl] ,[hl_desc] ,[hl_active_ingredient] ,[hl_ai_content] ,[hl_business_nature] ,[hl_brand] ,[hl_product_category]
,[hl_formulation_type] ,[hl_min_pack] ,[hl_conversion_ratio] ,[hl_segment] ,[hl_product_type] ,[hl_launch_year]
,[hl_scaling] ,[hl_nbc] ,[hl_sku] ,[hl_reporting] ,[YearMonth] ,[hl_latest] ,[Version] ,[hl_last_update]
,'Auto-Update' as [hl_modified_by]
FROM BPC_CTE AS BPC_CTE1
CROSS JOIN [dbo].[d_bpcyearmonth]
WHERE [Version] = (SELECT max([Version]) from BPC_CTE)
;
Some people asked about the table structure. Here's the d_bpc table:
The table structure is as follows:
USE [DWH]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[d_bpc](
[idbpc] [nchar](25) NOT NULL,
[bpc_desc] [nvarchar](120) NULL,
[bpc_active_ingredient] [nvarchar](120) NULL,
[bpc_ai_content] [nchar](23) NULL,
[bpc_business_nature] [nchar](10) NULL,
[bpc_brand] [nvarchar](80) NULL,
[bpc_product_category] [nchar](25) NULL,
[bpc_final_fp_code] [nchar](25) NULL,
[bpc_formulation_type] [nchar](3) NULL,
[bpc_min_pack] [nchar](38) NULL,
[bpc_npl] [nchar](20) NULL,
[bpc_conversion_ratio] [nchar](6) NULL,
[bpc_segment] [nchar](3) NULL,
[bpc_status] [nchar](50) NULL,
[bpc_product_type] [nchar](3) NULL,
[bpc_launch_year] [nchar](4) NULL,
[bpc_scaling] [nchar](3) NULL,
[bpc_nbc] [nchar](19) NULL,
[bpc_sku] [nchar](19) NULL,
[bpc_reporting] [nchar](19) NULL,
[bpc_last_update] [datetime] NULL,
[bpc_latest] [int] NULL,
[Version] [datetime] NULL
) ON [PRIMARY]
GO
Many thanks for your help in advance.
BB.
Best Answer
Try this; I added a DENSE_RANK() and another CTE to get just the latest version of the data (BEFORE the cross-join). Although with such a prolific explosion of data (from 4k to 470K rows), I'd be curious if your wait wasn't on tempdb to make enough space.
Still, this should at least prevent querying the BPC_CTE more than once, so might be faster.