I have been given a job of taking two tables (trimmed down create scripts below) and merging their data into a third table.
CREATE TABLE dbo.Ztest_nominal
(
BMCode varchar(20) NOT NULL,
nomcode varchar(500) NOT NULL,
Description nvarchar(4000) NULL,
LanguageID int NOT NULL,
CONSTRAINT PK_Ztest_nominal PRIMARY KEY CLUSTERED (BMCode ASC)
)
CREATE TABLE dbo.Ztest_Participation
(
ParticipationID int IDENTITY(1,1) NOT NULL,
BMCode varchar(20) NOT NULL,
LatestVersionNo varchar(5) NOT NULL,
LastVersionSubmitted varchar(5) NOT NULL,
CONSTRAINT PK_Ztest_Participation PRIMARY KEY CLUSTERED (BMCode ASC)
)
The third tables columns to take the contents of above (basically the above two tables):
[ParticipationID] [int] IDENTITY(1,1) NOT NULL,
[BMCode] [varchar](20) NOT NULL,
[LatestVersionNo] [varchar](5) NOT NULL,
[LastVersionSubmitted] [varchar](5) NOT NULL,
[nomcode] [varchar](500) NOT NULL,
[Description] [nvarchar](4000) NULL,
[LanguageID] [int] NOT NULL
Ztest_nominal
has about 63,000,000 rows and Ztest_Participation
about 62,000
The sql I was going to use (with an insert into) is (I did have the inner select outside before I gave up and posted here!):
SELECT
p.ParticipationId,
p.LatestVersionNo,
n.*
FROM
Ztest_nominal n
INNER JOIN
(
SELECT
ParticipationId,
LatestVersionNo,
BMCode
FROM
Ztest_participation
WHERE
LatestVersionNo = LastVersionSubmitted
) p
ON p.BMCode = n.BMCode
However i feel its going need some optimisation so its not slow, also I've already had out of memory errors when running it.
Any help would be appreciated
Best Answer
So I re-wrote your query (under a couple of assumptions) to hopefully set it up for "great success". The three main things the below queries do are:
1: Pre-filter the data from Ztest_Participation into a temp table. By doing this filtering upfront you simplify your final
INSERT
query for the query optimizer by pre-materializing just the data you need toJOIN
to first. Also materializing it into a temp table can leverage some of the performance benefits in TempDB later on (such as the potential of the data being materialized in memory).2: The final
INSERT
query leverages theTABLOCK
hint which results in less log generation and removes the potential of many row lock escalations, leading to optimized performance onINSERT
. (Additional information on the benefits ofTABLOCK
here.)3: I simplified your
JOIN
query between #Ztest_Participation_Filtered and Ztest_nominal by using a better relational solution. This should also help the query optimizer when generating a performant execution plan.*Note one assumption my above query makes is that you want to use the
ParticipationID
fromZtest_Participation
in your new 3rd table. You won't be able toINSERT
this field with your current table schema where it definesParticipationID
as anIDENTITY
in the new table. You'll either need to remove that table attribute before theINSERT
and then add in after theINSERT
or you can temporarily turn onIDENTITY_INSERT
for your new table. (But you should turn it back off after theINSERT
.)On my humble laptop, with the schemas you provided above, and 63,000,000 and 62,000 semi-randomly generated records to fill the tables respectively, the above batch of queries runs in a total runtime of 0 seconds. You can find the execution plan it generated for the FinalResults
INSERT
query on my end here.