Sql-server – Migrating 2+ GB Oracle CLOB column to SQL Server VARCHAR(MAX)

cloboraclesql server

I have a column on an Oracle db containing HTML data with embedded images that result in some records being over 2GB in size for that column. I have successfully migrated the rest of the database to SQL Server except for this CLOB column. With SQL Server varchar(max) unable to handle anything over 2GB in size, what are my options?

Best Answer

Normally I would recommend using the XML data type to store HTML but it too has the 2GB limit. This then leaves you with two options: FILESTREAM and FILETABLES. Be aware though, that both of these options require some additional configuration within Sql Server and your database before you are able to use them.

File tables are built on-top of FILESTREAM and offer some benefits if you need to access the files using standard Windows API calls. File Tables are generally transparent to the consumer and easier to integrate into your application but lack transactional access. This means that the files can be removed from outside of the Sql Server context which may be undesirable.

FILESTREAM however must be done under a transaction context and therefore prevents anyone (with exception of administrators/people granted explicit permission to the FILESTREAM folder) from removing them outside of Sql Server. They do not have the 2GB limit which will allow you to get around the problem that you are currently facing; however, accessing the data can be a little more complicated. Luckily, the documentation does a pretty decent job of explaining how to perform CRUD operations with FILESTREAM data.

If I had to pick one, based on what you have provided above, I'd go with FILESTREAM as this will offer you better consistency and control over your data.

Related Solutions

Sql-server – SQL Server nvarchar(max) vs nvarchar(n) affects performance

You're seeing the cost overhead of using MAX types.

While NVARCHAR(MAX) is identical to NVARCHAR(n) in TSQL and can be stored in-row, it is handled separately by the storage engine because it can be pushed off-row. When off-row it is a LOB_DATA allocation unit, rather than ROW_OVERFLOW_DATA allocation unit and we can assume from your observations that this carries an overhead.

You can see the two types are internally stored differently with a little DBCC PAGE spelunking. Mark Rasmussen posted example page dumps that show the differences in What is the Size of the LOB Pointer for (MAX) Types Like Varchar, Varbinary, Etc?

We can probably assume it's the GROUP BY on the MAX column that causes the performance difference in your case. I've not tested other operations on a MAX type but it might be interesting to do so and see if similar results are seen.

Sql-server – When was Torn Page Detection and Checksum introduced to SQL Server and what are the upgrade behaviors

In SQL Server 2000, if you want to identify corrupt pages, then the database option TORN_PAGE_DETECTION should be set to TRUE.

But in SQL 2005 and up, a new setting PAGE_VERIFY replaced the old TORN_PAGE_DETECTION which allows to choose from two different types of page verification : TORN_PAGE_DETECTION and CHECKSUM.

Now the question comes which one to set - TORN_PAGE_DETECTION or CHECKSUM ?

TORN_PAGE_DETECTION - writes a bit for every 512 bytes in a page allowing you to detect when a page was not successfully written to disk. The catch is that it wont tell you if the data stored in those 512 byes is actually correct or not due to the fact that couple of bytes may have been written incorrectly.

CHECKSUM - will caluclate a checksum of the page both when a page is written and when a page is read, assuming it has checksum on it.

The SQL Server computes the checksum based on the bit pattern on the page, stores it in the page header and then issues an I/O to write the page. When the SQL Server reads the page, it re-computes the checksum using the same logic and then compares it with the value available in the page header. If the checksum value matches then it is assumes the page did not get corrupted during the write-read cycle.

Since the cost of computing the checksum is incurred on each page read and write, it can add to the CPU overhead and can possibly impact the throughput of your workload. Another thing to keep in mind is that the checksum is not unique for a specific bit pattern on the page. Two pages can possibly map to the same checksum value. So there is remote possibility that page corruption may go undetected.

Reference : Checksum in SQL2005

To specifically answer your questions :

I believe Checksum was introduced in SQL2005 and that upgrading or restoring a DB from a prior version would maintain it's previous page verify method. i.e. there was no implicit upgrade.

Yes CHECKSUM was introduced in SQL Server 2005 and is the DEFAULT. When you upgrade from 2000 to 2005, you have to explicitly change the database option Page Verify to use CHECKSUM.

If you restore the database already created on sql 2005 to another server running sql 2005, you dont have to set it. It will persist to what ever you have set the Page Verify option to.

I've not succeeded in researching when Torn Page Detection came in

From: http://support.microsoft.com/kb/230785

Versions of SQL Server earlier than 7.0

Versions of SQL Server earlier than 7.0 did not provide log parity or torn bit detection facilities. In fact, those versions can write the same log page multiple times until the log records fill the 2-KB log page. This can expose transactions that have successfully committed. If the log page is being rewritten during a failure, a sector with the committed transaction may not get rewritten properly.

Thus, TORN_PAGE_DETECTION has been around since SQL Server 7.0. Even then, the default was that it was not enabled (same link).

Note Torn page detection is not enabled by default in SQL Server 7.0. See sp_dboption for how to enable the detection on your system.

Therefore, if the database was developed against a 7.0 instance and was subsequently upgraded, it would have upgraded the with the extant PAGE VERIFY option of NONE (as @ThomasStringer noted in his answer).

Edit : 09/24/2013 To improve the answer :

Refering to my SQL Server Internal notes from SQLSkills, I found that using a page dump, you can verify if torn bit detection - TORN_PAGE_DETECTION or CHECKSUM was enabled or not :

use database_name -- change here for your database !!
checkpoint
go 
dbcc traceon (3604)   -- send output to screen
go
dbcc page (dbaalert, 1,1,0)
dbcc traceoff (3604)  -- turn off the trace flag
go

m_tornBits : This holds either the page checksum or the bits that were displaced by the torn-page protection bits – depending on what form of page protection is turnde on for the database.

Note: I dont have any older sql server versions running. Below is confirmed from sql server 2000 and up. If you have a 7.0 or 6.5 running around, you can confirm it as well :-)

enter image description here

Best Answer

Related Solutions

Sql-server – SQL Server nvarchar(max) vs nvarchar(n) affects performance

Sql-server – When was Torn Page Detection and Checksum introduced to SQL Server and what are the upgrade behaviors

Related Question