Sql-server – Best way to re-import large amount of data with minimal downtime

bulkimportsql server

I need to import about 500,000 records containing IP lookup (read-only reference) data about once a week (only three int/bigint cols).

I don't really want to worry about merging the data with the existing table, I'd prefer to clear down the old and reimport.

Ideally queries running over the data would continue to run (we don't get a lot of these and it is acceptable for them to run a little bit slower whilst the import occurs, but need to be up 24/7, so running this "out of hours" is not an option).

Things Tried So far

SSIS: I have created an SSIS package that truncates the table and imports – it takes about 30 seconds to run (too long really).

Temp Table: Importing into a temp table, truncating and copying across also takes about 30 seconds.

BCP: Bulk Import also is rather too slow (for some reason it's slower than SSIS (even with no indices to maintain) – I'm guessing it's something to do with the char->int/bigint transactions :/

Mirror table? So, at the moment, I'm wondering about reading the table through a view, importing the data into a mirror table, and altering the view to point to this table… this seems like it will be quick, but it seems tiny bit hacky to me.

This seems like it should be a common problem, but I can't find recommended practises – any ideas would be most appreciated!

Thanks

Best Answer

A solution I've used in the past (and have recommended here and on StackOverflow before) is to create two additional schemas:

CREATE SCHEMA shadow AUTHORIZATION dbo;
CREATE SCHEMA cache  AUTHORIZATION dbo;

Now create a mimic of your table in the cache schema:

CREATE TABLE cache.IPLookup(...columns...);

Now when you are doing your switch operation:

TRUNCATE TABLE cache.IPLookup;
BULK INSERT cache.IPLookup FROM ...;

-- the nice thing about the above is that it doesn't really
-- matter if it takes one minute or ten - you're not messing
-- with a table that anyone is using, so you aren't going to
-- interfere with active users.


-- this is a metadata operation so extremely fast - it will wait
-- for existing locks to be released, but won't block new locks
-- for very long at all:

BEGIN TRANSACTION;
  ALTER SCHEMA shadow TRANSFER    dbo.IPLookup;
  ALTER SCHEMA dbo    TRANSFER  cache.IPLookup;
COMMIT TRANSACTION;


-- now let's move the shadow table back over to
-- the cache schema so it's ready for next load:

ALTER SCHEMA cache TRANSFER shadow.IPLookup;
TRUNCATE TABLE cache.IPLookup; 

-- truncate is optional - I usually keep the data
-- around for debugging, but that's probably not
-- necessary in this case.

This will be more cumbersome if you have foreign keys and other dependencies (since you may have to drop those and re-create them), and of course it completely invalidates statistics etc. and this, in turn, can affect plans, but if the most important thing is getting accurate data in front of your users with minimal interruption, this can be an approach to consider.

Related Solutions

Innodb – Fastest way to copy data from MyISAM to InnoDB

There are a few solutions. First, however, I'm not sure about the process in the first place. Is this a one time copy operation? A recurring copy operation? Do you want to migrate from MyISAM to InnoDB?

What is the main reason for your desire for a quick operation?

If you're looking for migration, then why don't you use an online table alter tool, such as oak-online-alter-table (disclaimer: I'm author of this tool) or pt-online-schema-change? Both will allow you to change your schema live and online with very little disturbance.

If you're looking to a copy+paste of your data, then I would suggest using chunking: copying the data in small packets. This way you don't get that huge lock and no funny timeouts. You can use either oak-chunk-update or pt-archiver for this. This may actually make the total runtime shorter because of reduces locking, but may also take longer. Also consider that it is not an atomic operation, and changes to original table while copying is made, may not get caught, so you may get an inconsistent copy.

Otherwise (or in addition) you can use all the usual tweaks, such as

SET GLOBAL innodb_flush_log_at_trx_commit := 2;

or set

[mysqld]
innodb_doublewrite = 0

or perhaps, depending on OS and disks,

[mysqld]
innodb_flush_method = O_DIRECT

Each of the above may reduce disk I/O access. First two will also make your server less crash safe. But if for limited time, this may be OK for you.

SQL Server – How to Import XML Data with ID Relationships

You can use a variant of the technique described in Using merge..output to get mapping between source.id and target.id and by Adam Machanic in Dr. OUTPUT or: How I Learned to Stop Worrying and Love the MERGE.

You use merge and capture the generated ID with the XML fragment that belongs to that ID in a table variable and then use that table variable when adding rows to the child tables.

declare @C table
(
  CONSTRUCTION_ID int primary key,
  PROJECT xml
);

merge CONSTRUCTION as T
using (
      select T.X.value('(constructionName/text())[1]', 'varchar(30)') as CONSTRUCTIONNAME,
             T.X.value('(constructionType/text())[1]', 'varchar(30)') as CONSTRUCTIONTYPE,
             T.X.query('project') as PROJECT
      from @xml.nodes('/root/construction') as T(X)
      ) as S
on 0 = 1
when not matched by target then
  insert (CONSTRUCTIONNAME, CONSTRUCTIONTYPE) 
  values (S.CONSTRUCTIONNAME, S.CONSTRUCTIONTYPE)
output inserted.CONSTRUCTION_ID, S.PROJECT into @C;

declare @P table
(
  PROJECT_ID int primary key,
  BUSINESSPARTNERS XML
);

merge PROJECT as T
using (
      select C.CONSTRUCTION_ID,
             T.X.value('(projectname/text())[1]', 'varchar(30)') as PROJECTNAME,
             T.X.query('businesspartners') as BUSINESSPARTNERS
      from @C as C
        cross apply C.PROJECT.nodes('/project') as T(X)
      ) as S
on 0 = 1
when not matched by target then
  insert (CONSTRUCTION_ID, PROJECTNAME)
  values(S.CONSTRUCTION_ID, S.PROJECTNAME)
output inserted.PROJECT_ID, S.BUSINESSPARTNERS into @P;

insert into BUSINESSPARTNERS(PROJECT_ID, PARTNERNAME)
select P.PROJECT_ID,
       T.X.value('text()[1]', 'varchar(30)')
from @P as P
  cross apply P.BUSINESSPARTNERS.nodes('/businesspartners/partnername') as T(X);

SQL Fiddle

Best Answer

Related Solutions

Innodb – Fastest way to copy data from MyISAM to InnoDB

SQL Server – How to Import XML Data with ID Relationships

Related Question