SQL Server 2008 R2 – How to Speed Up Looped INSERT INTO Statements

hintsinsertperformancesql-server-2008-r2

I currently use the following statement, for 10,000 rows it takes about 150 seconds. I tried removing the index on the target table, but this didn't help. Running the loop without the INSERT INTO takes less then 50ms. I need it to update about 300 million rows and I can't really wait 52 days (!) for it to complete.

Bottom line of the following update query is that I need to go over each row, perform calculations on a VARBINARY and extract proper values from it (we need to get rid of the packed VARBINARY fields), and store that in a new table.

FETCH NEXT FROM LocCities INTO @LocCity 
WHILE (@@FETCH_STATUS = 0)
BEGIN
    -- several sets, removed calculations for clarity
    SET @LocationId = Calculation1()
    SET @CityId = Calculation2()

    IF(@LocCity <> 0)
    BEGIN
        -- left out an inner loop here on the VARBINARY based on its length
        INSERT INTO LocationCities (LocationId, CityId)
        VALUES (@LocationId, @CityId)
    END
    FETCH NEXT FROM RespCursor INTO @TuningRow
END

I understand that I can use the WITH keyword with table hints, but I am not sure what to use. I expect the final update query to run in several hours, and hope there's a way to do that. I really can't wait almost two months ;).

Isn't there something similar like BULKINSERT that I can use?

Best Answer

I really don't think table hints or BULKINSERT are going to help you here - your approach is still to process each varbinary value one at a time, and this will be your downfall regardless - especially when you discard the idea of set-based queries because you "don't think it's possible."

Here's a set-based approach with no awful loops or cursors. This assumes that the pattern is always the same (LocationID is the first byte, and CityID is the next two).

DECLARE @x TABLE(x VARBINARY(32));

INSERT @x VALUES(0x010734),(0x030735040736),(0x030742050743060712);

;WITH n(n) AS 
(
  SELECT TOP (300) (number*3)+1 
  FROM [master].dbo.spt_values -- your own Numbers table is better
  WHERE [type] = N'P' ORDER BY number
)
-- INSERT dbo.LocationCities(LocationId, CityId)
SELECT 
  x.x,      -- comment this out before insert 
  LocationID = CONVERT(INT, SUBSTRING(x.x, n, 1)),
  CityID     = CONVERT(INT, SUBSTRING(x.x, n+1, 2))
FROM @x AS x INNER JOIN n ON LEN(x) > n.n;

Results:

x                        LocationID    CityID
---------------------    ----------    ------
0x010734                 1             1844
0x030735040736           3             1845
0x030735040736           4             1846
0x030742050743060712     3             1858
0x030742050743060712     5             1859
0x030742050743060712     6             1810

Some articles that will help you understand numbers tables and why generating sets in SQL Server is far superior to even the most efficient loop you can derive.

Related Solutions

Sql-server – How to use merge hints to isolate complex queries in SQL Server

If you use a multi-statement UDF, then your inner select is executed exactly once for each outer row. The multi-statement UDF is treated as a black box: the execution plan will now show access to the objects used in your complex view.

On the other hand, a subquery and/or an inline UDF is flattened out by the optimizer. When this is the case, the execution plan will include access to the objects used in your complex view.

Innodb – Fastest way to copy data from MyISAM to InnoDB

There are a few solutions. First, however, I'm not sure about the process in the first place. Is this a one time copy operation? A recurring copy operation? Do you want to migrate from MyISAM to InnoDB?

What is the main reason for your desire for a quick operation?

If you're looking for migration, then why don't you use an online table alter tool, such as oak-online-alter-table (disclaimer: I'm author of this tool) or pt-online-schema-change? Both will allow you to change your schema live and online with very little disturbance.

If you're looking to a copy+paste of your data, then I would suggest using chunking: copying the data in small packets. This way you don't get that huge lock and no funny timeouts. You can use either oak-chunk-update or pt-archiver for this. This may actually make the total runtime shorter because of reduces locking, but may also take longer. Also consider that it is not an atomic operation, and changes to original table while copying is made, may not get caught, so you may get an inconsistent copy.

Otherwise (or in addition) you can use all the usual tweaks, such as

SET GLOBAL innodb_flush_log_at_trx_commit := 2;

or set

[mysqld]
innodb_doublewrite = 0

or perhaps, depending on OS and disks,

[mysqld]
innodb_flush_method = O_DIRECT

Each of the above may reduce disk I/O access. First two will also make your server less crash safe. But if for limited time, this may be OK for you.

Best Answer

Related Solutions

Sql-server – How to use merge hints to isolate complex queries in SQL Server

Innodb – Fastest way to copy data from MyISAM to InnoDB

Related Question