C# SQL Server 2008 – NVARCHAR(MAX) Performance for Data Under 4000 Characters

csql-server-2008

I have to develop a CMS which will support two Language English, Arabic. This CMS will be a sort of Article Publishing site. While designing & analysis i found that some articles are more than 8000 characters in length. My table has some column as

PageID int,
PageTitleEnglish nvarchar(200),
PageTitleArabic nvarchar(200),
PageDescEnglish nvarchar(500),
PageDescArabic nvarchar(500),
PageBodyEnglish nvarchar(max)
PageBodyArabic nvarchar(max)

If i keep PageBody as nvarchar(4000) then i a limited to 4000 characters and if i have to store Arabic version then i need 16000 bytes (As Arabic is Unicode and take 3 time more space then ASCII).

So i am only left with option of defining PageBody as nVarchar(max), This will have it downside from performance point of view. My actual question is if some data in PageBody column is less than 4000 characters will it MS SQL Store than data in inline column or separately in the database.

I looked for this on Google also but didn't find any relevant answer and how i can improve performance in such scenario.

Any suggestions for best practice for such design of multilingual CMS are welcome.

I need to Support Only two languages Arabic & English

Best Answer

An nvarchar(max) value will be stored "in-row" if it is short enough.

The default behaviour can be modified using sp_tableoption, "large value types out of row" option. I wouldn't bother. The DB engine will manage this efficiently by itself.

As for design, there are several ways of doing this based on your model:

  • Will you always have both English and Arabic?
  • Can one be optional? If so, will one always be mandatory?
  • Do you expect more languages later?

1. Separate tables

That is, you can split off the separate languages into different tables.
This allows table level collations rather than column level ones

It allows allows more rows per page and more chance of in-row LOB storage

PageParent

  • PageID int,
  • PageOtherInfo...

PageEnglish (note varchar may be OK here)

  • PageID int,
  • PageTitleEnglish varchar(200),
  • PageDescEnglish varchar(500),
  • PageBodyEnglish varchar(max)

PageArabic

  • PageID int,
  • PageTitleArabic nvarchar(200),
  • PageDescArabic nvarchar(500),
  • PageBodyArabic nvarchar(max)

2. Separate rows

Or have a languageID column to support several languages.
This has the drawback that collation will be fixed for all languages which means poor sorting/filtering

PageParent

  • PageID int,
  • PageOtherInfo..

Page

  • PageID int,
  • LanguageCode,
  • PageTitle nvarchar(200),
  • PageDesc nvarchar(500),
  • PageBody nvarchar(max)