A solution I've used in the past (and have recommended here and on StackOverflow before) is to create two additional schemas:
CREATE SCHEMA shadow AUTHORIZATION dbo;
CREATE SCHEMA cache AUTHORIZATION dbo;
Now create a mimic of your table in the cache
schema:
CREATE TABLE cache.IPLookup(...columns...);
Now when you are doing your switch operation:
TRUNCATE TABLE cache.IPLookup;
BULK INSERT cache.IPLookup FROM ...;
-- the nice thing about the above is that it doesn't really
-- matter if it takes one minute or ten - you're not messing
-- with a table that anyone is using, so you aren't going to
-- interfere with active users.
-- this is a metadata operation so extremely fast - it will wait
-- for existing locks to be released, but won't block new locks
-- for very long at all:
BEGIN TRANSACTION;
ALTER SCHEMA shadow TRANSFER dbo.IPLookup;
ALTER SCHEMA dbo TRANSFER cache.IPLookup;
COMMIT TRANSACTION;
-- now let's move the shadow table back over to
-- the cache schema so it's ready for next load:
ALTER SCHEMA cache TRANSFER shadow.IPLookup;
TRUNCATE TABLE cache.IPLookup;
-- truncate is optional - I usually keep the data
-- around for debugging, but that's probably not
-- necessary in this case.
This will be more cumbersome if you have foreign keys and other dependencies (since you may have to drop those and re-create them), and of course it completely invalidates statistics etc. and this, in turn, can affect plans, but if the most important thing is getting accurate data in front of your users with minimal interruption, this can be an approach to consider.
My first comment is that you are doing an ELT (Extract, Load, Transform) rather than an ETL (Extract, Transform, Load). While ELTs leverage set based relational advantages and can be very fast, they are sometimes very write intensive (hard on storage). Specifically, the t-log. This is because the transform is done on disk (typically an update or insert). I prefer ETL when possible, as the transform is done in the buffer and, when done correctly, requires minimal t-log writes. Buffer is cheap. Fast storage is not. For some bulk operations, the t-log is a non-value adding bottleneck.
Here are a few things that you're doing but I wouldn't recommend.
- Bulk loading to tempdb. I'd recommend doing the bulk load on a real table in the destination database. Then you can size you're files accordingly and not worry about impacting tempdb.
- Bundling independent procedures together. Split these two procedures up. The bulk load and the merge are independent of each other. Splitting them into individual procedures makes them more modular / unit testable.
It looks like you have the minimal logging rules covered pretty well. You're loading to an empty B-Tree with no non-clustereds, using tf 610, the ordering key specified, in bulk-logged mode. Outside of the temp table, everything looks ok here. As long as the file is actually ordered by the key, you should be good. Are you popping the log on tempdb or the user database?
On the merge statement:
UPDATES will always be fully logged. Are you changing a pretty significant portion of your table? If so, you might consider doing the merge in memory (SSIS data flow task or .Net) then bulk loading into a new table. This is more work, but most of the work is done in the buffer and minimal t-log is used. A minimally logged insert can be faster than a fully logged update if the portion of change is significant.
Since you're using tf 610, the insert can minimally log when using a tablock hint. See here for more info on merge with tablock: http://blogs.msdn.com/b/sqlserverstorageengine/archive/2010/06/03/minimal-logging-and-merge-statement.aspx Note, the update will still be fully logged if you go this route.
Best Answer
Why this is a bad idea.
Your dump comes from a Paradox database. Paradox supports a lot of types about 18. CSV is typeless. It's even worse than JSON. Everything is a string.
All you can do is infer with heuristics. You've lost that data. Inferring is not going to be easy. To do it right
This is possible, but you're still guessing and it's a laborious task. Some languages give you tools to start you down this path, like
looks_like_number
, and some type systems provide methods likeis_type
that will help you get there faster.Good ideas
If you still have the database however, you can plug into ODBC. Or you can use other code that can handle it. As a side note. pxlib has a cli tool called pxview which dumps to SQL. That's the best starting point.