Azure SQL Database – How to Limit DTU Percentage Consumption for a Session?

azureazure-sql-databasefragmentationnonclustered-index

We have an Azure SQL Database tier S4 (200 DTU), and there is one big table inside
When I run a "alter index rebuild…" statement, it takes about 18 mins to complete

This particular nonclustered index is 5 Gb in size, and during those 18 minutes of rebuilding, it takes all 100% of DTU available to this database
Of course, this is impacting other applications that are trying to work with this database

Question:

How can we limit available resources for a given session that runs a rebuild ?
Say, we want to make it use only 50%-60% of Database's DTUs, but not 100%

Is this possible on Azure SQL Database ?

Regards,

Best Answer

You just thought of a great enhancement request. Sort of an OPTION(MAXDTU 5000) sort of deal.

Not there now. Best bet would either be to scale up for the operation you need programmatically or try and limit the MAXDOP for that operation either an OPTION (MAXDOP 1) hint. The index operation will run longer in serial but likely consume less DTUs.

Related Solutions

Azure SQL Database – Executing Query Using Linked Server Takes Hours

I think you're running into a key limitation of distributed queries (e.g. four part notation). As per this MS blog post:

data for remote queries are collected from the remote servers, scrubbed locally, combined together and presented to end user as single record set

Basically the source tables in their entirety are trying to get pulled over the linked server and all the joins and filtering are happening locally. When you're not working with millions of records this is often transparent, but when you include large tables you can immediately feel the pain.

In your situation, I suggest you push a copy of the SAExMark.dbo.RealGreenZipCodeData table up to your Azure DB and then run this UPDATE statement from a connection made directly to the AzureDB. If you need to initiate the process from a different server, stuff the update into a Stored Procedure in the Azure DB and then execute the SP remotely.

Alternatively, if you don't want/can't push that table to Azure, you could convert the query to utilize OPENQUERY for all the remote join operations. This will quickly get ugly, but it's possible. OPENQUERY will force execution of whatever query you pass to it remotely, therefore allowing the join logic against that large table to be done remotely so only whatever records are returned from that query travel back over the linked server. These would then be joined against the local copy of the SAExMark.dbo.RealGreenZipCodeData to identify what records in AZUREDB.dbname.dbo.DealerMedias would need to be updated. I really don't like this approach just to update a table on a remote database, but it's an option.

SQL Server Performance – Resolving Slow ORDER BY Queries

Analysis

For the query with no order preference, SQL Server can stream grouped rows using a Hash Match Flow Distinct. If it encounters the required number of distinct entries quickly, the execution time is short.

When a specific order is required, SQL Server must test every row. For example, to place rows in name order, it must sort all rows by name. This will be slow if there are a lot of rows, and no index to provide that order without sorting.

There are a number of fundamental complications in your case, most notably the partitioning, and the disjunction on [year]. The partitioning means your indexes cannot deliver the order you might expect. For example an index on name is actually sorted first by partition number, then by name. It cannot deliver rows sorted on name alone.

You also have FORCED PARAMETERIZATION set. This may be beneficial overall, but it comes with impacts you should fully understand. That combined with the partitioning and multi-column indexes means your statistics are largely useless.

The disjunction on year also messes with ordering, and means SQL Server can only seek year >= 0 and year <= 2013 in your plan. This is much less selective than seeking on year = 0 and year = 2013 separately.

Recommendations

So, in light of all the above:

A good index for the ORDER BY name query is:

CREATE INDEX [IX dbo.files cid, year, name : grapado IS NULL AND masterversion IS NULL] 
ON dbo.files (cid, [year], [name])
INCLUDE (grapado, masterversion)
WHERE grapado IS NULL AND masterversion IS NULL;

A better index for the value_number table is:

CREATE INDEX [IX dbo.value_number id_file, id_field, value] 
ON dbo.value_number (id_file, id_field, [value]);

The query can then be written to fetch at most 50 rows for each year and partition. We then take the first 50 in order from the combined set:

WITH PartitionNumbers AS
(
    -- Each partition of the table
    SELECT P.partition_number
    FROM sys.partitions AS P
    WHERE P.[object_id] = OBJECT_ID(N'dbo.files', N'U')
    AND P.index_id = 1
)
SELECT
    FF.id,
    FF.[name],
    FF.[year],
    FF.cid,
    FF.created,
    vnVE0.keywordValueCol0_numeric
FROM PartitionNumbers AS PN
CROSS APPLY
(
    SELECT
        F100.*
    FROM 
    (
        -- 50 rows in order for year 2013
        SELECT
            F.id,
            F.[name],
            F.[year],
            F.cid,
            F.created
        FROM dbo.files AS F
        WHERE
            F.grapado IS NULL
            AND F.masterversion IS NULL
            AND F.[year] = 2013
            AND F.cid = 19
            AND $PARTITION.PF_files_partitioning(F.created) = PN.partition_number
        ORDER BY
            F.[name]
            OFFSET 0 ROWS
            FETCH FIRST 50 ROWS ONLY

        UNION ALL

        -- 50 rows in order for year 0
        SELECT
            F.id,
            F.[name],
            F.[year],
            F.cid,
            F.created
        FROM dbo.files AS F
        WHERE
            F.grapado IS NULL
            AND F.masterversion IS NULL
            AND F.[year] = 0
            AND F.cid = 19
            AND $PARTITION.PF_files_partitioning(F.created) = PN.partition_number
        ORDER BY
            F.[name]
            OFFSET 0 ROWS
            FETCH FIRST 50 ROWS ONLY
    ) AS F100
) AS FF
OUTER APPLY
(
    -- Lookup distinct values
    SELECT
        keywordValueCol0_numeric = 
            CASE
                WHEN VN.[value] IS NOT NULL AND VN.[value] <> ''
                THEN CONVERT(decimal(28, 2), VN.[value])
                ELSE CONVERT(decimal(28, 2), 0)
            END
    FROM dbo.value_number AS VN
    WHERE
        VN.id_file = FF.id
        AND VN.id_field = 260
    GROUP BY
        VN.[value]
) AS vnVE0
ORDER BY
    FF.[name]
    OFFSET 0 ROWS
    FETCH FIRST 50 ROWS ONLY;

The execution plan will sort at most 100 rows from the files table:

A different index would generally be required for each different ordering.

If you need to count the total number of results, use:

SELECT COUNT_BIG(*) 
FROM dbo.files AS F
OUTER APPLY
(
    SELECT DISTINCT VN.[value]
    FROM dbo.value_number AS VN
    WHERE
        VN.id_file = F.id
        AND VN.id_field = 260
) AS vnVE0
WHERE
    F.grapado IS NULL
    AND F.masterversion IS NULL
    AND F.[year] IN (0, 2013)
    AND F.cid = 19;

db<>fiddle