Sql-server – varchar performance impact

performancesql-server-2008varchar

This question has been asked before ( Speed impact of using varchar in SQL Server 2008 ), but the answers are not detailed enough to my satisfaction. Admitedly this is not a current issue and mainly for my education.

However, I will try and make this a concrete example from some current DB schema work I am doing;

Option A

CREATE TABLE [VMware].[TblHostSystem](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    <snip/>
    [PowerStateID] [int] NOT NULL
)

ALTER TABLE [VMware].[TblHostSystem]  WITH
CHECK ADD  CONSTRAINT [FK_TblHostSystem_TblPowerState]
FOREIGN KEY([PowerStateID])
REFERENCES [VMware].[TblPowerState] ([ID])
GO

CREATE TABLE [VMware].[TblPowerState](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [PowerState] [varchar](50) NOT NULL
)

Option B

CREATE TABLE [VMware].[TblHostSystem](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    <snip/>
    [PowerState] [varchar](50) NOT NULL
)

If I've chosen option B (as the previous author of the schema has), as the power of a host is something that will frequently change, what sort of churn will this create?

Some statements that I believe are true,

  1. Varchars are stored in the same row/page on MSSQL 2008.
  2. Varchars are not padded?

So my question is, when updating PowerState in the second example from 'poweredOn' (20 bytes) to 'poweredOff' (22 bytes) what does the DB need to do?

  1. Move the row to a new page because there is now room for the extra 2 bytes?
  2. Move everything after the current row 2 bytes forward because there is a cluster index on ID and the order on disk has to be maintained?
  3. None of the above, the database actually allocates 102 bytes (maximum possible size of varchar(50)) and has loads of free space most of the time to accomodate such changes without massive churn?
  4. Breakout the varchar(50) column to another page to avoid the massive churn caused by 1, and 2, and it doesn't pad out as 3, but it means locking potentially multiple pages per row and two reads off the disk (maybe) rather than one.

Would appreciate a detailed of what happens under the covers!

Best Answer

First questions -

  • Yes they are stored on page normally (for non-(MAX) values)
  • They are not padded, but there is a 2 byte per row overhead to store the length of the field

1 - IT DEPENDS on if the page is full or not. Very likely it will fit on the current page. The header record for the row needs to updated as well to update the length of the field.

2 - Data will only be rearranged at the page level for an insert or update, so nothing else gets shifted. If it fits on your current page it will, if not there will be a page split.

3 - Never as far as I know for varchar - if you want this behavior you can make it a CHAR instead.

4 - Only if the value exceeds the space on the page.

As to the design itself...

You will definitely get better performance by using the id instead of a varchar.

  • An int or smallint is orders of magnitude smaller than your strings.
  • Index lookups are a lot more efficient on integers than on strings
  • Data stored on the page will be consistent and much smaller
  • SQL likes keys! A lot of optimizations can take place when you use PK/FK relationships because the engine can infer some metadata about duplicates/distinct values, etc.