This question has been asked before ( Speed impact of using varchar in SQL Server 2008 ), but the answers are not detailed enough to my satisfaction. Admitedly this is not a current issue and mainly for my education.
However, I will try and make this a concrete example from some current DB schema work I am doing;
Option A
CREATE TABLE [VMware].[TblHostSystem](
[ID] [int] IDENTITY(1,1) NOT NULL,
<snip/>
[PowerStateID] [int] NOT NULL
)
ALTER TABLE [VMware].[TblHostSystem] WITH
CHECK ADD CONSTRAINT [FK_TblHostSystem_TblPowerState]
FOREIGN KEY([PowerStateID])
REFERENCES [VMware].[TblPowerState] ([ID])
GO
CREATE TABLE [VMware].[TblPowerState](
[ID] [int] IDENTITY(1,1) NOT NULL,
[PowerState] [varchar](50) NOT NULL
)
Option B
CREATE TABLE [VMware].[TblHostSystem](
[ID] [int] IDENTITY(1,1) NOT NULL,
<snip/>
[PowerState] [varchar](50) NOT NULL
)
If I've chosen option B (as the previous author of the schema has), as the power of a host is something that will frequently change, what sort of churn will this create?
Some statements that I believe are true,
- Varchars are stored in the same row/page on MSSQL 2008.
- Varchars are not padded?
So my question is, when updating PowerState in the second example from 'poweredOn' (20 bytes) to 'poweredOff' (22 bytes) what does the DB need to do?
- Move the row to a new page because there is now room for the extra 2 bytes?
- Move everything after the current row 2 bytes forward because there is a cluster index on ID and the order on disk has to be maintained?
- None of the above, the database actually allocates 102 bytes (maximum possible size of varchar(50)) and has loads of free space most of the time to accomodate such changes without massive churn?
- Breakout the varchar(50) column to another page to avoid the massive churn caused by 1, and 2, and it doesn't pad out as 3, but it means locking potentially multiple pages per row and two reads off the disk (maybe) rather than one.
Would appreciate a detailed of what happens under the covers!
Best Answer
First questions -
(MAX)
values)1 - IT DEPENDS on if the page is full or not. Very likely it will fit on the current page. The header record for the row needs to updated as well to update the length of the field.
2 - Data will only be rearranged at the page level for an insert or update, so nothing else gets shifted. If it fits on your current page it will, if not there will be a page split.
3 - Never as far as I know for
varchar
- if you want this behavior you can make it aCHAR
instead.4 - Only if the value exceeds the space on the page.
As to the design itself...
You will definitely get better performance by using the
id
instead of avarchar
.PK/FK
relationships because the engine can infer some metadata about duplicates/distinct values, etc.