Sql-server – Storing NULL versus storing ” in a varchar column

sql serversql-server-2005

I realize this may be marked as duplicate, but I'm specifically asking in relation to SQL Server 2005

I have read conflicting advice on the internet so I'm asking here. Specifically in SQL Server 2005 does a NULL in a varchar column take up the same space as an empty string?

I built a 'holding' table on another drive, and populated it with the data from the source table, and wherever the fields were blank I used nullif([field],'') to insert nulls in place of the blanks.

Then I built a new table with exactly the same structure as the holding table, but instead of replacing blanks with null I just inserted the blanks, and so far it seems to be taking up more space (I'm not yet done populating it and I can't be sure it is taking up more data yet)

So before I fill it further and end up with a table bigger than I thought it would be, am I better off inserting nulls or blanks?

Edit:

After migrating the data from the holding table to the new table, the new table is approximately 4gb larger.

Table sizes differences

There are only two small differences in the table design – The 'serial_number' field is char(15) in the holding table but varchar(15) in the destination table. (The maximum length of a serial number is 14 and there are many empty values – I think around 30 million if I recall), and the clustered index for the holding table has an extra column – program_name..

Holding table

USE [Temp_holding_EWS]
GO
/****** Object:  Table [dbo].[AmtoteAccountActivity_holding]    
 Script Date: 02/17/2017 20:41:32 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[AmtoteAccountActivity_holding](
    [_Date] [char](8) NULL,[Community] [varchar](10) NULL,
    [AccountNumber] [varchar](50) NULL,
    [Branch] [varchar](10) NULL,
    [Window] [varchar](3) NULL,
    [Time] [char](8) NULL,[Balance_Forward] [varchar](10) NULL,
    [Transaction_Type] [varchar](10) NULL,
    [Program_Name] [varchar](10) NULL,
    [Race] [varchar](10) NULL,[Pool_Type] [varchar](10) NULL,
    [Amount] [money] NULL,[Runners] [varchar](60) NULL,
    [Total_Bet_Amount] [varchar](10) NULL,
    [Debit_Amount] [varchar](10) NULL,
    [Credit_Amount] [varchar](10) NULL,
    [Tx_Date] [char](8) NULL,
    [Check_Clear_Date] [varchar](10) NULL,
    [Refund_Amt] [varchar](10) NULL,
    [Bet_Pool_Modifier] [varchar](5) NULL,
    [RecordID] [int] IDENTITY(1,1) NOT NULL,
    [serial_number] [char](15) NULL,
    [handle]  AS 
       (CONVERT([money],[total_bet_amount],(0))-CONVERT([money],[refund_amt],(0))),
    [txdatetime]  AS (CONVERT([datetime],([tx_date]+' ')+[time],(11))),
    [dbdate]  AS (CONVERT([datetime],[_date],(11))),
    [Audit_Trail] [varchar](20) NULL,
 CONSTRAINT [PK_AmtoteAccountActivity_holding] PRIMARY KEY NONCLUSTERED 
(
    [RecordID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF

(Clustered index)

USE [Temp_holding_EWS]
GO
/****** Object:  Index [IX_AmtoteAccountActivity_holding] 
    Script Date: 02/17/2017 21:08:44 ******/
CREATE CLUSTERED INDEX [IX_AmtoteAccountActivity_holding] ON 
    [dbo].[AmtoteAccountActivity_holding] 
(
    [AccountNumber] ASC,
    [_Date] ASC,
    [Program_Name] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, 
    SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Destination table

USE [EWS]
GO
/****** Object:  Table [dbo].[AmtoteAccountActivity]    
Script Date: 02/17/2017 20:48:16 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[AmtoteAccountActivity](
    [_Date] [char](8) NULL,     [Community] [varchar](10) NULL,
    [AccountNumber] [varchar](50) NULL,
    [Branch] [varchar](10) NULL,[Window] [varchar](3) NULL,
    [Time] [char](8) NULL,  [Balance_Forward] [varchar](10) NULL,
    [Transaction_Type] [varchar](10) NULL,
    [Program_Name] [varchar](10) NULL,
    [Race] [varchar](10) NULL,
    [Pool_Type] [varchar](10) NULL,
    [Amount] [money] NULL,[Runners] [varchar](60) NULL,
    [Total_Bet_Amount] [varchar](10) NULL,
    [Debit_Amount] [varchar](10) NULL,
    [Credit_Amount] [varchar](10) NULL,
    [Tx_Date] [char](8) NULL,
    [Check_Clear_Date] [varchar](10) NULL,
    [Refund_Amt] [varchar](10) NULL,
    [Bet_Pool_Modifier] [varchar](5) NULL,
    [RecordID] [int] IDENTITY(1,1) NOT NULL,
    [serial_number] [varchar](15) NULL,
    [handle]  AS 
       (CONVERT([money],[total_bet_amount],(0))-CONVERT([money],[refund_amt],(0))),
    [txdatetime]  AS (CONVERT([datetime],([tx_date]+' ')+[time],(11))),
    [dbdate]  AS (CONVERT([datetime],[_date],(11))),
    [Audit_Trail] [varchar](20) NULL,
 CONSTRAINT [PK_AmtoteAccountActivity2] PRIMARY KEY NONCLUSTERED 
(
    [RecordID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF

(Clustered index)

USE [EWS]
GO
/****** Object:  Index [IX_AmtoteAccountActivity2]  Script Date: 02/17/2017 21:06:29 ******/
CREATE CLUSTERED INDEX [IX_AmtoteAccountActivity2] ON [dbo].[AmtoteAccountActivity] 
(
    [AccountNumber] ASC,
    [_Date] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, 
SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, 
ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

(Note: For anyone wondering why it has apparently financial and numerical values stored in character fields: That was the original table design 17 years ago (not by me) and there are now hundreds of sql queries that run on this database, it is less work to keep them as varchar and the queries keep their casting, than it is to change them to money,int, or decimal and change hundreds of queries)

Best Answer

Let's create three tables with a varchar column, two of them allowing NULL, one not.

CREATE TABLE dbo.x1(id int IDENTITY(1,1) PRIMARY KEY, field varchar(5) null);
CREATE TABLE dbo.x2(id int IDENTITY(1,1) PRIMARY KEY, field varchar(5) null);
CREATE TABLE dbo.x3(id int IDENTITY(1,1) PRIMARY KEY, field varchar(5) not null);

Populate them with 1,000,000 rows:

;WITH x(x) AS (SELECT 0 UNION ALL SELECT x+1 FROM x WHERE x < 1000000)
INSERT dbo.x1(field) SELECT NULL FROM x OPTION (MAXRECURSION 0);
INSERT dbo.x2(field) SELECT '' FROM dbo.x1;
INSERT dbo.x3(field) SELECT '' FROM dbo.x1;

Let's check the size:

SELECT COUNT(*)*8192/1024. FROM sys.dm_db_database_page_allocations(DB_ID(), 
  OBJECT_ID(N'dbo.x1'), 1, NULL, 'DETAILED');
SELECT COUNT(*)*8192/1024. FROM sys.dm_db_database_page_allocations(DB_ID(), 
  OBJECT_ID(N'dbo.x2'), 1, NULL, 'DETAILED');
SELECT COUNT(*)*8192/1024. FROM sys.dm_db_database_page_allocations(DB_ID(), 
  OBJECT_ID(N'dbo.x3'), 1, NULL, 'DETAILED');

Results:

12,928 KB
12,936 KB
12,936 KB

So it looks like for 1,000,000 rows, choosing NULL over '' saves a whopping 8 KB (and this isn't even reflected in sp_spaceused, because that one page you saved is still reserved, just not allocated).

Repeated for a heap (again, have to do multiple tests since we're guessing about your actual table structure):

12,872 KB
12,872 KB
12,928 KB

So, negligible, as I suggested, even extrapolating over 120,000,000 rows, the biggest possible difference (once more, depending on your schema) would be 960KB on a proper table, and 6.7MB on a heap. If your server is so tight on disk space that 6.7MB is going to drive decisions, you might consider how much an additional disk would cost when compared to the time you're spending investigating this.

IMHO, there are far more important reasons between deciding to use NULLs or not to represent "no data." A good question with lots of opinions and commentary is here: