Sql-server – Best way to re-import large amount of data with minimal downtime

bulkimportsql server

I need to import about 500,000 records containing IP lookup (read-only reference) data about once a week (only three int/bigint cols).

I don't really want to worry about merging the data with the existing table, I'd prefer to clear down the old and reimport.

Ideally queries running over the data would continue to run (we don't get a lot of these and it is acceptable for them to run a little bit slower whilst the import occurs, but need to be up 24/7, so running this "out of hours" is not an option).

Things Tried So far

SSIS: I have created an SSIS package that truncates the table and imports – it takes about 30 seconds to run (too long really).

Temp Table: Importing into a temp table, truncating and copying across also takes about 30 seconds.

BCP: Bulk Import also is rather too slow (for some reason it's slower than SSIS (even with no indices to maintain) – I'm guessing it's something to do with the char->int/bigint transactions :/

Mirror table? So, at the moment, I'm wondering about reading the table through a view, importing the data into a mirror table, and altering the view to point to this table… this seems like it will be quick, but it seems tiny bit hacky to me.

This seems like it should be a common problem, but I can't find recommended practises – any ideas would be most appreciated!

Thanks

Best Answer

A solution I've used in the past (and have recommended here and on StackOverflow before) is to create two additional schemas:

CREATE SCHEMA shadow AUTHORIZATION dbo;
CREATE SCHEMA cache  AUTHORIZATION dbo;

Now create a mimic of your table in the cache schema:

CREATE TABLE cache.IPLookup(...columns...);

Now when you are doing your switch operation:

TRUNCATE TABLE cache.IPLookup;
BULK INSERT cache.IPLookup FROM ...;

-- the nice thing about the above is that it doesn't really
-- matter if it takes one minute or ten - you're not messing
-- with a table that anyone is using, so you aren't going to
-- interfere with active users.


-- this is a metadata operation so extremely fast - it will wait
-- for existing locks to be released, but won't block new locks
-- for very long at all:

BEGIN TRANSACTION;
  ALTER SCHEMA shadow TRANSFER    dbo.IPLookup;
  ALTER SCHEMA dbo    TRANSFER  cache.IPLookup;
COMMIT TRANSACTION;


-- now let's move the shadow table back over to
-- the cache schema so it's ready for next load:

ALTER SCHEMA cache TRANSFER shadow.IPLookup;
TRUNCATE TABLE cache.IPLookup; 

-- truncate is optional - I usually keep the data
-- around for debugging, but that's probably not
-- necessary in this case.

This will be more cumbersome if you have foreign keys and other dependencies (since you may have to drop those and re-create them), and of course it completely invalidates statistics etc. and this, in turn, can affect plans, but if the most important thing is getting accurate data in front of your users with minimal interruption, this can be an approach to consider.