Sql-server – Altering Column Size On a Super Large Table

azure-sql-databasesql server

I have a table with a column of type NVarChar(Max).

I didn't take good precautions before allowing users to add data so right now I have more than 2.5 billion rows and the database is becoming super large.

I'm trying to alter the column to change the size to a fixed length where records > that width should be deleted.

This is not an indexed column.

I tried LEN(Text)>width but this is a very slow function as it would scan billions of rows.

We've also experimented later with changing Int to BigInt by creating a new table and moving data there but it took a week or so.

Alter Column would freeze the system.

Can you kindly suggest any other way or suggest the method you feel is best?

The database is hosted on Azure SQL

Thank you so much 🙂

Best Answer

The best method is always specific to your requirements and situation. Some approaches might be faster than others but require more temporary space or disallow end users from querying the table. Based on what you've said in your question I'm going to assume the following:

  • Your primary concern is reducing space and you don't want to have to grow your database to perform this operation.
  • Azure SQL limits your options in moving the data to a new database or filegroup.
  • You want the table to remain online for end user queries during the maintenance operations.
  • The table has a clustered index index.

One approach to fix it:

  1. Immediately make changes in the application to prevent end users from loading new data that's above your desired maximum length. No reason to let the problem get worse while you figure out what to do.
  2. Verify that your desired maximum length will free up enough space. If a simple COUNT_BIG(*) is too slow you can use TABLESAMPLE and a bit of math to figure out how many rows you'll delete. I created a 100 GB table on my local machine and running a 1% sample query took less than a second with a cold cache. Example code below:

    DROP TABLE IF EXISTS #BIG_TABLE;
    
    CREATE TABLE #BIG_TABLE (
        ID BIGINT NOT NULL,
        FOR_U NVARCHAR(MAX),
        PRIMARY KEY (ID)
    );
    
    DECLARE @big_string NVARCHAR(MAX) = REPLICATE(CAST(N'Z' AS VARCHAR(MAX)), 200000);
    
    INSERT INTO #BIG_TABLE WITH (TABLOCK)
    SELECT TOP (250000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN, @big_string -- 100k is good
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
    OPTION (MAXDOP 1);
    
    CHECKPOINT;
    DBCC DROPCLEANBUFFERS;
    
    
    SELECT COUNT_BIG(*) total_sampled_rows
    , COUNT_BIG(CASE WHEN DATALENGTH(FOR_U) > 8000 THEN 1 ELSE NULL END) sampled_rows_too_long
    FROM #BIG_TABLE TABLESAMPLE (1 PERCENT);
    

    Notice the use of DATALENGTH instead of LEN. See Datalength optimizations for LOB data types… for the reasons you might find that performs better.

  3. Delete rows from the table in small batches. Use the clustered key to loop over all rows exactly once. If you'd like more detail here, try Take Care When Scripting Batches. Otherwise, the core idea is to increment local variables as you process the table:

    DECLARE @start_id BIGINT = 1;
    DECLARE @end_id BIGINT = 500;
    
    DELETE FROM #BIG_TABLE
    WHERE ID BETWEEN @start_id AND @end_id;
    
  4. Once all of the deletes are done, stop and measure your progress. Is the table small enough now? You may need to run a REORGANIZE or REBUILD to fully reclaim space. I recommend REORGANIZE because it's not an all-or-nothing operation (if you cancel it you keep your progress), it won't grow your log file, and it's an online operation.

  5. Stop here. If the problem is that the table takes up too much space then you've solved that the problem by removing unneeded rows from the table. You probably do not need to change the data type in order to reclaim space. Having an unnecessary NVARCHAR(MAX) isn't ideal, but the downsides may not justify the effort in changing the data type. If you do need to change the data type then it will be easier to do so after you've freed up some space in your database.