Azure SQL Database – Executing Query Using Linked Server Takes Hours

azure-sql-databasesql server

I have a linked server to Azure SQL database and I am trying to run the below SQL query which takes hours to run and at one point, I then have to cancel the execution. I have researched everywhere and have tried everything but no result. This query takes about 3-4 minutes when I run it on local, the reason being the DealerMediaRates table has about 7.8 million records. I could probably optimize this query, maybe that probably solves the Azure issue but I find it hard to believe that this query would run for hours when I use Azure linked server that I eventually have to stop. Our SQL server tier in Azure is S2, we even tried switching it to S3 but no luck.

UPDATE  dmr 
SET     Quantity = t.[One50K],
        LastModifierRoutineName = @LastUpdatedRoutineName,
        LastModifiedUtc = GETDATE()
FROM    AZUREDB.dbname.dbo.DealerMediaRates dmr 
INNER JOIN AZUREDB.dbname.dbo.DealerMedias dm ON dm.DealerMediaId = dmr.DealerMediaId
INNER JOIN AZUREDB.dbname.dbo.Dealers d ON d.DealerId = dm.DealerId 
INNER JOIN AZUREDB.dbname.dbo.DealerGroups dg ON dg.DealerGroupId = d.DealerGroupId 
INNER JOIN AZUREDB.dbname.dbo.DealerMetadataValues dmdv ON dmdv.DealerId = d.DealerId 
INNER JOIN AZUREDB.dbname.dbo.MetadataFields mf ON mf.MetadataFieldId = dmdv.MetadataFieldId AND mf.ClientId = @MetisClientId AND mf.MetadataCode = 'Real Green #'
INNER JOIN SAExMark.dbo.RealGreenZipCodeData t ON t.ParentCompanyID = dmdv.[Value] AND t.DealerID = dm.SubDescription AND t.Zip = dm.[Description] AND dm.MediaCode = 'DM' AND t.[SegmentCode] = 'RA'
WHERE   dmr.MediaRateClassificationId = 13
AND     dmr.MediaSegmentId = 5

THIS TAKES ABOUT 7 MINS to RUN

UPDATE  dm 
SET     IsDeleted = 1,
        LastModifierRoutineName = @LastUpdatedRoutineName,
        LastModifiedUtc = GETDATE()
FROM    AZUREDB.dbname.dbo.DealerMedias dm 
INNER JOIN AZUREDB.dbname.dbo.Dealers d ON d.DealerId = dm.DealerId 
INNER JOIN AZUREDB.dbname.dbo.DealerGroups dg ON dg.DealerGroupId = d.DealerGroupId AND dg.ClientId = @MetisClientId
INNER JOIN AZUREDB.dbname.dbo.DealerMetadataValues dmdv ON dmdv.DealerId = d.DealerId 
INNER JOIN AZUREDB.dbname.dbo.MetadataFields mf ON mf.MetadataFieldId = dmdv.MetadataFieldId AND mf.ClientId = @MetisClientId AND mf.MetadataCode = 'Real Green #' 
LEFT OUTER JOIN SAExMark.dbo.RealGreenZipCodeData t ON t.ParentCompanyID = dmdv.[Value] AND t.DealerID = dm.SubDescription AND t.Zip = dm.[Description] AND dm.MediaCode = 'DM'
WHERE   t.DealerID IS NULL
AND     dm.MediaCode = 'DM'
AND     d.Name NOT LIKE '%Test%'

enter image description here
enter image description here]3

Best Answer

I think you're running into a key limitation of distributed queries (e.g. four part notation). As per this MS blog post:

data for remote queries are collected from the remote servers, scrubbed locally, combined together and presented to end user as single record set

Basically the source tables in their entirety are trying to get pulled over the linked server and all the joins and filtering are happening locally. When you're not working with millions of records this is often transparent, but when you include large tables you can immediately feel the pain.

In your situation, I suggest you push a copy of the SAExMark.dbo.RealGreenZipCodeData table up to your Azure DB and then run this UPDATE statement from a connection made directly to the AzureDB. If you need to initiate the process from a different server, stuff the update into a Stored Procedure in the Azure DB and then execute the SP remotely.

Alternatively, if you don't want/can't push that table to Azure, you could convert the query to utilize OPENQUERY for all the remote join operations. This will quickly get ugly, but it's possible. OPENQUERY will force execution of whatever query you pass to it remotely, therefore allowing the join logic against that large table to be done remotely so only whatever records are returned from that query travel back over the linked server. These would then be joined against the local copy of the SAExMark.dbo.RealGreenZipCodeData to identify what records in AZUREDB.dbname.dbo.DealerMedias would need to be updated. I really don't like this approach just to update a table on a remote database, but it's an option.