I have a linked server to Azure SQL database and I am trying to run the below SQL query which takes hours to run and at one point, I then have to cancel the execution. I have researched everywhere and have tried everything but no result. This query takes about 3-4 minutes when I run it on local, the reason being the DealerMediaRates table has about 7.8 million records. I could probably optimize this query, maybe that probably solves the Azure issue but I find it hard to believe that this query would run for hours when I use Azure linked server that I eventually have to stop. Our SQL server tier in Azure is S2, we even tried switching it to S3 but no luck.
UPDATE dmr
SET Quantity = t.[One50K],
LastModifierRoutineName = @LastUpdatedRoutineName,
LastModifiedUtc = GETDATE()
FROM AZUREDB.dbname.dbo.DealerMediaRates dmr
INNER JOIN AZUREDB.dbname.dbo.DealerMedias dm ON dm.DealerMediaId = dmr.DealerMediaId
INNER JOIN AZUREDB.dbname.dbo.Dealers d ON d.DealerId = dm.DealerId
INNER JOIN AZUREDB.dbname.dbo.DealerGroups dg ON dg.DealerGroupId = d.DealerGroupId
INNER JOIN AZUREDB.dbname.dbo.DealerMetadataValues dmdv ON dmdv.DealerId = d.DealerId
INNER JOIN AZUREDB.dbname.dbo.MetadataFields mf ON mf.MetadataFieldId = dmdv.MetadataFieldId AND mf.ClientId = @MetisClientId AND mf.MetadataCode = 'Real Green #'
INNER JOIN SAExMark.dbo.RealGreenZipCodeData t ON t.ParentCompanyID = dmdv.[Value] AND t.DealerID = dm.SubDescription AND t.Zip = dm.[Description] AND dm.MediaCode = 'DM' AND t.[SegmentCode] = 'RA'
WHERE dmr.MediaRateClassificationId = 13
AND dmr.MediaSegmentId = 5
THIS TAKES ABOUT 7 MINS to RUN
UPDATE dm
SET IsDeleted = 1,
LastModifierRoutineName = @LastUpdatedRoutineName,
LastModifiedUtc = GETDATE()
FROM AZUREDB.dbname.dbo.DealerMedias dm
INNER JOIN AZUREDB.dbname.dbo.Dealers d ON d.DealerId = dm.DealerId
INNER JOIN AZUREDB.dbname.dbo.DealerGroups dg ON dg.DealerGroupId = d.DealerGroupId AND dg.ClientId = @MetisClientId
INNER JOIN AZUREDB.dbname.dbo.DealerMetadataValues dmdv ON dmdv.DealerId = d.DealerId
INNER JOIN AZUREDB.dbname.dbo.MetadataFields mf ON mf.MetadataFieldId = dmdv.MetadataFieldId AND mf.ClientId = @MetisClientId AND mf.MetadataCode = 'Real Green #'
LEFT OUTER JOIN SAExMark.dbo.RealGreenZipCodeData t ON t.ParentCompanyID = dmdv.[Value] AND t.DealerID = dm.SubDescription AND t.Zip = dm.[Description] AND dm.MediaCode = 'DM'
WHERE t.DealerID IS NULL
AND dm.MediaCode = 'DM'
AND d.Name NOT LIKE '%Test%'
]3
Best Answer
I think you're running into a key limitation of distributed queries (e.g. four part notation). As per this MS blog post:
Basically the source tables in their entirety are trying to get pulled over the linked server and all the joins and filtering are happening locally. When you're not working with millions of records this is often transparent, but when you include large tables you can immediately feel the pain.
In your situation, I suggest you push a copy of the
SAExMark.dbo.RealGreenZipCodeData
table up to your Azure DB and then run this UPDATE statement from a connection made directly to the AzureDB. If you need to initiate the process from a different server, stuff the update into a Stored Procedure in the Azure DB and then execute the SP remotely.Alternatively, if you don't want/can't push that table to Azure, you could convert the query to utilize OPENQUERY for all the remote join operations. This will quickly get ugly, but it's possible. OPENQUERY will force execution of whatever query you pass to it remotely, therefore allowing the join logic against that large table to be done remotely so only whatever records are returned from that query travel back over the linked server. These would then be joined against the local copy of the
SAExMark.dbo.RealGreenZipCodeData
to identify what records inAZUREDB.dbname.dbo.DealerMedias
would need to be updated. I really don't like this approach just to update a table on a remote database, but it's an option.