I need to import about 500,000 records containing IP lookup (read-only reference) data about once a week (only three int/bigint cols).
I don't really want to worry about merging the data with the existing table, I'd prefer to clear down the old and reimport.
Ideally queries running over the data would continue to run (we don't get a lot of these and it is acceptable for them to run a little bit slower whilst the import occurs, but need to be up 24/7, so running this "out of hours" is not an option).
Things Tried So far
SSIS: I have created an SSIS package that truncates the table and imports – it takes about 30 seconds to run (too long really).
Temp Table: Importing into a temp table, truncating and copying across also takes about 30 seconds.
BCP: Bulk Import also is rather too slow (for some reason it's slower than SSIS (even with no indices to maintain) – I'm guessing it's something to do with the char->int/bigint transactions :/
Mirror table? So, at the moment, I'm wondering about reading the table through a view, importing the data into a mirror table, and altering the view to point to this table… this seems like it will be quick, but it seems tiny bit hacky to me.
This seems like it should be a common problem, but I can't find recommended practises – any ideas would be most appreciated!
Thanks
Best Answer
A solution I've used in the past (and have recommended here and on StackOverflow before) is to create two additional schemas:
Now create a mimic of your table in the
cache
schema:Now when you are doing your switch operation:
This will be more cumbersome if you have foreign keys and other dependencies (since you may have to drop those and re-create them), and of course it completely invalidates statistics etc. and this, in turn, can affect plans, but if the most important thing is getting accurate data in front of your users with minimal interruption, this can be an approach to consider.