You should only need to drop and recreate the affected NC index.
Saying that, on a test server, you can see the differences in doing this compared to your strategy of dropping all indexes above.
I reckon dropping/creating the single index would be quicker overall because the data will be shifted around twice otherwise: once char to varchar, another to build the clustered index. Then you have the NC creation overhead.
You will want to load your data into a new table, doing this in small batches, then drop the existing table. I put together a quick example using the Sales.Customer table in AdventureWorks, something similar should work for you also.
First, create your new table, complete with the new datatype you want to use:
CREATE TABLE [Sales].[Currency_New](
[CurrencyCode] [nchar](4) NOT NULL,
[Name] [varchar](128) NOT NULL,
[ModifiedDate] [datetime] NOT NULL,
CONSTRAINT [PK_Currency_New_CurrencyCode] PRIMARY KEY CLUSTERED
(
[CurrencyCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
GO
Then, insert your records and define your batch. I am using 10 here, but you will likely want to use something larger, say 10,000 rows at a time. For 30MM rows I'd even suggest you go to 100k row batch size at a time, that's the limit I typically used with larger tables:
DECLARE @RowsInserted INT, @InsertVolume INT
SET @RowsInserted = 1
SET @InsertVolume = 10 --Set to # of rows
WHILE @RowsInserted > 0
BEGIN
INSERT INTO [Sales].[Currency_New] ([CurrencyCode]
,[Name]
,[ModifiedDate])
SELECT TOP (@InsertVolume)
SC.[CurrencyCode]
,SC.[Name]
,SC.[ModifiedDate]
FROM [Sales].[Currency] AS SC
LEFT JOIN [Sales].[Currency_New] AS SCN
ON SC.[CurrencyCode] = SCN.[CurrencyCode]
WHERE SCN.[CurrencyCode] IS NULL
SET @RowsInserted = @@ROWCOUNT
END
I usually do a sanity check and verify the rowcounts are the same before cleaning up:
SELECT COUNT(*) FROM [Sales].[Currency]
SELECT COUNT(*) FROM [Sales].[Currency_New]
Once you are confident you have migrated your data, you can drop the original table:
DROP TABLE [Sales].[Currency]
Last step, rename the new table, so that users don't have to change any code:
EXEC sp_rename '[Sales].[Currency_New]', '[Sales].[Currency]';
GO
I don't know how long this will take. I'd suggest you try doing this when you have a clear maintenance window and users aren't connected.
HTH
Best Answer
There are certain circumstances where dropping a column can be a meta-data-only operation. The column definitions for any given table are not included in each and every page where rows are stored, column definitions are only stored in the database metadata, including sys.sysrowsets, sys.sysrscols, etc.
When dropping a column that is not referenced by any other object, the storage engine simply marks the column definition as no longer present by deleting the pertinent details from various system tables. The action of deleting the meta-data invalidates the procedure cache, necessitating a recompile whenever a query subsequently references that table. Since the recompile only returns columns that currently exist in the table, the column details for the dropped column are never even asked for; the storage engine skips the bytes stored in each page for that column, as if the column no longer exists.
When a subsequent DML operation occurs against the table, the pages that are affected are re-written without the data for the dropped column. If you rebuild a clustered index or a heap, all the bytes for the dropped column are naturally not written back to the page on disk. This effectively spreads the load of dropping the column over time, making it less noticeable.
There are circumstances where you cannot drop a column, such as when the column is included in an index, or when you've manually created a statistics object for the column. I wrote a blog post showing the error that is presented when attempting to alter a column with a manually created statistics object. The same semantics apply when dropping a column - if the column is referenced by any other object, it cannot simply be dropped. The referencing object must be altered first, then the column can be dropped.
This is fairly easy to show by looking at the contents of the transaction log after dropping a column. The code below creates a table with a single 8,000 long char column. It adds a row, then drops it, and displays the contents of the transaction log applicable to the drop operation. The log records show modifications to various system tables where the table and column definitions are stored. If the column data was actually being deleted from the pages allocated to the table, you'd see log records recording the actual page data; there are no such records.
(The output is too big to show here, and dbfiddle.uk won't allow me to access fn_dblog)
The first set of output shows the log as a result of the DDL statement dropping the column. The second set of output shows the log after running the DML statement where we update the
rid
column. In the second result set, we see log records indicating a delete against dbo.DropColumnTest, followed by an insert into dbo.DropColumnTest. Each Log Record Length is 8116, indicating the actual page was updated.As you can see from the output of the
fn_dblog
command in the test above, the entire operation is fully logged. This goes for simple recovery, as well as full recovery. The terminology "fully logged" maybe misinterpreted as the data modification is not logged. This is not what happens - the modification is logged, and can be fully rolled back. The log is simply only recording the pages that were touched, and since none of the table's data-pages were logged by the DDL operation, both theDROP COLUMN
, and any rollback that might occur will happen extremely quickly, regardless of the size of the table.For science, the following code will dump the data pages for the table included in the code above, using
DBCC PAGE
, style "3". Style "3" indicates we want the page header plus detailed per-row interpretation. The code uses a cursor to display the details for every page in the table, so you may want to make sure you don't run this on a large table.Looking at the output for the first page from my demo (after the column is dropped, but before the column is updated), I see this:
I've removed most of the raw page dump from the output shown above for brevity. At the end of the output, you'll see this for the
rid
column:The last line above,
rid = 1
, returns the name of the column, and the current value stored in the column on the page.Next, you'll see this:
The output shows that Slot 0 contains a deleted column, by virtue of the
DELETED
text where the column name would normally be. The value of the column is returned asNULL
since the column has been deleted. However, as you can see in the raw data, the 8,000 character long value,REPLICATE('Z', 8000)
, for that column still exists on the page. This is a sample of that part of the DBCC PAGE output: