Sql-server – What happens when I try to store more 8019 bytes in a SQL Server table row

sql serversql-server-2008

I read –

total fixed and variable length data are still limited to 8019 bytes total

But my data can possibly be more than this. What happens if it's more. Does it still store the data correctly?

Also can someone tell me how I can store for example over 4000 bytes of text data in a column. Is that possible?

Best Answer

Data exceeding the total of 8060 bytes will be pushed over to an 'overflow page', increasing the amount of pages required to be read from the buffer pool.

Related Solutions

Sql-server – SQL Server 2008 R2 Dirty reads – how non-atomic

EDITED after reading the MSDN forum link from the comment, very interesting.

Regardless of isolation level, two users cannot update a single page simultaneously, nor can any user read a partially updated page. Just imagine how SQL Server would deal with a page where the header says Col3 starts at byte 17. But it really starts at byte 25, because that part of the row hasn't been updated yet. There's no way a database could handle that.

But for rows larger than 8k, multiple pages are used, and that makes a half-updated column possible. Copied from the MSDN link (in case the link breaks), start this query in one window:

if object_id('TestTable') is not null
    drop table TestTable
create table TestTable (txt nvarchar(max) not null)
go
insert into TestTable select replicate(convert(varchar(max),
    char(65+abs(checksum(newid()))%26)),100000)
go 10
update TestTable set txt=replicate(convert(varchar(max),
    char(65+abs(checksum(newid()))%26)),100000)
go 100000

This creates a table and then updates it with a string of 100.000x the same character. While the first query is running, start this query in another window:

while 1=1 begin
 if exists (select * from TestTable (nolock) where left(Txt,1) <> right(Txt,1))
    break
end

The second query stops when it reads a column that is half updated. That is, when the first character is different from the last. It will finish quickly, proving that it is possible to read half-updated columns. If you remove the nolock hint, the second query will never finish.

Surprising result! A half-updated XML column might break a (nolock) report, because the XML would be malformed.

C# SQL Server 2008 – NVARCHAR(MAX) Performance for Data Under 4000 Characters

An nvarchar(max) value will be stored "in-row" if it is short enough.

The default behaviour can be modified using sp_tableoption, "large value types out of row" option. I wouldn't bother. The DB engine will manage this efficiently by itself.

As for design, there are several ways of doing this based on your model:

Will you always have both English and Arabic?
Can one be optional? If so, will one always be mandatory?
Do you expect more languages later?

1. Separate tables

That is, you can split off the separate languages into different tables.
This allows table level collations rather than column level ones

It allows allows more rows per page and more chance of in-row LOB storage

PageParent

PageID int,
PageOtherInfo...

PageEnglish (note varchar may be OK here)

PageID int,
PageTitleEnglish varchar(200),
PageDescEnglish varchar(500),
PageBodyEnglish varchar(max)

PageArabic

PageID int,
PageTitleArabic nvarchar(200),
PageDescArabic nvarchar(500),
PageBodyArabic nvarchar(max)

2. Separate rows

Or have a languageID column to support several languages.
This has the drawback that collation will be fixed for all languages which means poor sorting/filtering

PageParent

PageID int,
PageOtherInfo..

Page

PageID int,
LanguageCode,
PageTitle nvarchar(200),
PageDesc nvarchar(500),
PageBody nvarchar(max)

Best Answer

Related Solutions

Sql-server – SQL Server 2008 R2 Dirty reads – how non-atomic

C# SQL Server 2008 – NVARCHAR(MAX) Performance for Data Under 4000 Characters

Related Question