SQL Server – Pitfalls of Renaming Tables

loadsql server

I've thought of an approach for a process that builds a fairly large table (2million rows), now I'm canvassing opinion for it. What's going to bite me with this renaming approach to a making a table 'live'?

Background:
The table (data.CatExt) is an overlayed version of our product catalogue. It builds daily in a complex procedure (getting data from lots of different areas and deciding which source trumps when there are conflicts – the complexity is in the business logic, not the technological side). All the work and any tables required to get the final output happen in a 'build' schema, and the final table lives in the 'data' schema. The schemas have different permissions.

The table is pretty wide, about 100 fields. We've taken the approach of building it from scratch each day, rather than using a 'change/additions/deletions' approach, because of the nature of the data sources (horrible to work out what is going to require a change to the final table.) So we build a new version of the table each day, and then (after it passes some automated checks) allow that to become the new live version.

New approach:
So here's where the new approach comes in. What's the best way to replace the live version? The simplest option is to truncate and replace with the new data, but that seems like a lot of writing of data unnecessarily. We could do an update over each field, but with 100 fields that's a lot of nasty coding/comparison to find any differences. So I thought, why not rename the table? It would go like this:

  1. Live version data.CatExt exists and is useable through steps 2 and 3.
  2. Build new version of the table build.CatExt.
  3. Checks, indexing etc over build.CatExt all complete – ie, it's perfect.
  4. Drop (well, I'm too chicken – probably rename) data.CatExt, rename build.CatExt to data.CatExt.

Voila, new version, no unnecessary moving of data and associated long time while the live version is being locked.

Please point out the flaws. (Be gentle, I'm new! 🙂 )

Best Answer

I do something similar to Jason - which doesn't require partitioning, by the way - but I use schema transfers instead of switching. It requires two additional schemas to support multiple transfers - in my case I call them fake and shadow. Basically you empty the shadow table, then populate it (you could also use an upsert methodology, but I find truncate/insert to be faster and less complicated, especially for a background process). You can optionally update stats on the shadow copy at this point which, again, is a background process, so won't interrupt your users except in a really overloaded workload. Then you start a transaction, move the production (dbo) copy of the table into the fake schema, move the newly populated shadow schema into production, and commit. This is a metadata operation only and will require schema locks but it should happen almost instantaneously. Finally, move the old production copy of the table into the shadow schema (and optionally truncate it now to save space - though I often kept the table populated so I have a backup copy of the last version of the table, just in case).

TRUNCATE TABLE shadow.Lookup;

INSERT shadow.Lookup([cols]) 
  SELECT [cols] FROM [source];

-- perhaps an explicit statistics update here

BEGIN TRANSACTION;

  ALTER SCHEMA fake TRANSFER     dbo.Lookup;
  ALTER SCHEMA dbo  TRANSFER  shadow.Lookup;

COMMIT TRANSACTION;

ALTER SCHEMA shadow TRANSFER fake.Lookup;

-- optionally truncate shadow.Lookup now

Not really any advantage of one over the other, but you should read a few of my other posts for more background, some of my reasoning, gotchas, etc. For example, foreign keys will not likely be your friend here - in my scenario this worked best for aggregated, denormalized data, or copies that were being used for read-only workloads (and thus didn't need referential constraints - not that I'm advocating that or anything).