I have a column on an Oracle db containing HTML data with embedded images that result in some records being over 2GB in size for that column. I have successfully migrated the rest of the database to SQL Server except for this CLOB column. With SQL Server varchar(max) unable to handle anything over 2GB in size, what are my options?
Sql-server – Migrating 2+ GB Oracle CLOB column to SQL Server VARCHAR(MAX)
cloboraclesql server
Related Solutions
You're seeing the cost overhead of using MAX
types.
While NVARCHAR(MAX)
is identical to NVARCHAR(n)
in TSQL and can be stored in-row, it is handled separately by the storage engine because it can be pushed off-row. When off-row it is a LOB_DATA
allocation unit, rather than ROW_OVERFLOW_DATA
allocation unit and we can assume from your observations that this carries an overhead.
You can see the two types are internally stored differently with a little DBCC PAGE spelunking. Mark Rasmussen posted example page dumps that show the differences in What is the Size of the LOB Pointer for (MAX) Types Like Varchar, Varbinary, Etc?
We can probably assume it's the GROUP BY
on the MAX
column that causes the performance difference in your case. I've not tested other operations on a MAX
type but it might be interesting to do so and see if similar results are seen.
In SQL Server 2000, if you want to identify corrupt pages, then the database option TORN_PAGE_DETECTION should be set to TRUE.
But in SQL 2005 and up, a new setting PAGE_VERIFY replaced the old TORN_PAGE_DETECTION which allows to choose from two different types of page verification : TORN_PAGE_DETECTION and CHECKSUM.
Now the question comes which one to set - TORN_PAGE_DETECTION or CHECKSUM ?
TORN_PAGE_DETECTION - writes a bit for every 512 bytes in a page allowing you to detect when a page was not successfully written to disk. The catch is that it wont tell you if the data stored in those 512 byes is actually correct or not due to the fact that couple of bytes may have been written incorrectly.
CHECKSUM - will caluclate a checksum of the page both when a page is written and when a page is read, assuming it has checksum on it.
The SQL Server computes the checksum based on the bit pattern on the page, stores it in the page header and then issues an I/O to write the page. When the SQL Server reads the page, it re-computes the checksum using the same logic and then compares it with the value available in the page header. If the checksum value matches then it is assumes the page did not get corrupted during the write-read cycle.
Since the cost of computing the checksum is incurred on each page read and write, it can add to the CPU overhead and can possibly impact the throughput of your workload. Another thing to keep in mind is that the checksum is not unique for a specific bit pattern on the page. Two pages can possibly map to the same checksum value. So there is remote possibility that page corruption may go undetected.
Reference : Checksum in SQL2005
To specifically answer your questions :
I believe Checksum was introduced in SQL2005 and that upgrading or restoring a DB from a prior version would maintain it's previous page verify method. i.e. there was no implicit upgrade.
Yes CHECKSUM was introduced in SQL Server 2005 and is the DEFAULT. When you upgrade from 2000 to 2005, you have to explicitly change the database option Page Verify to use CHECKSUM.
If you restore the database already created on sql 2005 to another server running sql 2005, you dont have to set it. It will persist to what ever you have set the Page Verify option to.
I've not succeeded in researching when Torn Page Detection came in
From: http://support.microsoft.com/kb/230785
Versions of SQL Server earlier than 7.0
Versions of SQL Server earlier than 7.0 did not provide log parity or torn bit detection facilities. In fact, those versions can write the same log page multiple times until the log records fill the 2-KB log page. This can expose transactions that have successfully committed. If the log page is being rewritten during a failure, a sector with the committed transaction may not get rewritten properly.
Thus, TORN_PAGE_DETECTION has been around since SQL Server 7.0. Even then, the default was that it was not enabled (same link).
Note Torn page detection is not enabled by default in SQL Server 7.0. See sp_dboption for how to enable the detection on your system.
Therefore, if the database was developed against a 7.0 instance and was subsequently upgraded, it would have upgraded the with the extant PAGE VERIFY option of NONE (as @ThomasStringer noted in his answer).
Edit : 09/24/2013 To improve the answer :
Refering to my SQL Server Internal notes from SQLSkills, I found that using a page dump, you can verify if torn bit detection - TORN_PAGE_DETECTION or CHECKSUM was enabled or not :
use database_name -- change here for your database !!
checkpoint
go
dbcc traceon (3604) -- send output to screen
go
dbcc page (dbaalert, 1,1,0)
dbcc traceoff (3604) -- turn off the trace flag
go
m_tornBits : This holds either the page checksum or the bits that were displaced by the torn-page protection bits – depending on what form of page protection is turnde on for the database.
Note: I dont have any older sql server versions running. Below is confirmed from sql server 2000 and up. If you have a 7.0 or 6.5 running around, you can confirm it as well :-)
Related Question
- Sql-server – Avoiding “Row by row” fetch method when dealing with source LOB columns
- Sql-server – Cannot re-attach SQL Server Express database to the same server after detaching
- Sql-server – Buffer Pool Extension
- Sql-server – What are the current best practices concerning varchar sizing in SQL Server
- Sql-server – How to migrate Unicode UTF-8 CLOB data from Oracle to SQL Server 2017 UTF-?
- Sql-server – Linked Server to Oracle error ORA-01804
Best Answer
Normally I would recommend using the XML data type to store HTML but it too has the 2GB limit. This then leaves you with two options: FILESTREAM and FILETABLES. Be aware though, that both of these options require some additional configuration within Sql Server and your database before you are able to use them.
File tables are built on-top of
FILESTREAM
and offer some benefits if you need to access the files using standard Windows API calls. File Tables are generally transparent to the consumer and easier to integrate into your application but lack transactional access. This means that the files can be removed from outside of the Sql Server context which may be undesirable.FILESTREAM
however must be done under a transaction context and therefore prevents anyone (with exception of administrators/people granted explicit permission to theFILESTREAM
folder) from removing them outside of Sql Server. They do not have the 2GB limit which will allow you to get around the problem that you are currently facing; however, accessing the data can be a little more complicated. Luckily, the documentation does a pretty decent job of explaining how to perform CRUD operations withFILESTREAM
data.If I had to pick one, based on what you have provided above, I'd go with
FILESTREAM
as this will offer you better consistency and control over your data.