I have two tables in my SQLServer database (azure managed), and the structure of two tables as below (for easy reference renamed them)
dbo.TABLE1
CREATE TABLE [dbo].[TABLE1] (
[ID] VARCHAR (50) NOT NULL, -- PK index
[S_NUMBER] VARCHAR (50) NOT NULL,
[MAKE] VARCHAR (12) NULL,
[MODEL] VARCHAR (40) NULL,
[C_ID] VARCHAR (50) NOT NULL,
[C_CODE] REAL NULL,
[C_DESCRIPTION] VARCHAR (50) NULL,
[S_CODE] REAL NULL,
[S_DESCRIPTION] VARCHAR (50) NULL,
[LS_TIME] DATETIME2 (7) NOT NULL,
[LE_TIME] DATETIME2 (7) NOT NULL,
[YEAR] DATE NULL,
[MONTH] DATE NULL,
[WEEK] DATE NULL,
[DAY] DATE NULL,
[HOUR] DATETIME2 (7) NULL,
[WEEK_DAY] VARCHAR (12) NULL,
[F_USED] REAL NULL,
[F_USED_UOM] VARCHAR (12) NULL,
[DIST_TRAVE] REAL NULL,
[DIST_TRAV_UOM] VARCHAR (12) NULL,
[DURATION] REAL NULL,
[DURATION_UOM] VARCHAR (12) NULL,
[WEIGHT_MOVED] REAL NULL,
[WEIGHT_STORED] REAL NULL,
[WEIGHT_UOM] VARCHAR (12) NULL,
[CREATED_TIME] DATETIME NULL,
[MODIFIED_TIME] DATETIME NULL,
[OPER_ID] VARCHAR (12) NULL,
[OPER_TYPE] VARCHAR (12) NULL,
[START_TIME] DATETIME2 (7) NULL,
[END_TIME] DATETIME2 (7) NULL,
[UTC_DAY] DATE NULL,
[UTC_WEEK] DATE NULL,
[UTC_MONTH] DATE NULL,
[UTC_YEAR] SMALLINT NULL,
[UTC_HOUR] TINYINT NULL,
[LOCAL_HOUR] TINYINT NULL,
[EFFICIENCY] DECIMAL (5) NULL,
[JS_ID] BIGINT NOT NULL,
[L_ZONE] NVARCHAR (50) NULL,
[D_ZONE] NVARCHAR (50) NULL,
[SHIFT_TYPE] NVARCHAR (12) NULL,
[SHIFT_DATE] DATE NULL,
[D_CATEGORY] NVARCHAR (50) NULL,
[REC_CATEGORY] NVARCHAR (12) NULL,
[LSERIAL_NUMBER] VARCHAR (50) NULL,
[GPS_TIME] DATETIME NULL,
[GPS_STATUS] VARCHAR (12) NULL,
[A_ID] VARCHAR (50) NULL,
[E_ID] VARCHAR (50) NULL,
[TAG_GUID] VARCHAR (64) NULL,
[LZ_ID] VARCHAR (64) NULL,
[DZ_ID] VARCHAR (64) NULL,
[LZ_VERSION] INT NULL,
[DZ_VERSION] INT NULL,
[Z_ID] BIGINT NULL,
[SHIFT_ID] NVARCHAR (64) NULL,
[LI_TIME] DATETIME2 (3) CONSTRAINT [S_CREATED_TIMESTAMP] DEFAULT (sysdatetime()) NULL,
[PC_ID] INT NOT NULL DEFAULT 0,
[L_MAKE] VARCHAR (32) NULL,
[TV_MOVED] DECIMAL (18, 2) NULL
);
dbo.TABLE2
CREATE TABLE [dbo].[TABLE2](
[ID] NVARCHAR(32) NOT NULL, -- PK index
[C_ID] NVARCHAR(64),
[DURATION] DECIMAL(18,2),
[P_COUNT] BIGINT,
[LS_TIME] DATETIME,
[LE_TIME] DATETIME,
[S_AGG_ID] NVARCHAR(32) -- FK index
)
dbo.TABLE2 is subset of dbo.TABLE1 with only 5 columns, and the row count is same. We capture only 5 columns from dbo.TABLE1 and insert in dbo.TABLE2. But the size(space used) of dbo.TABLE2 is almost 3 times the dbo.TABLE1.
Indexes: Apart from PK, FK indexes we have one additional index on dbo.TABLE1 with include columns [JS_ID, MAKE, S_NUMBER, TAG_GUID, LE_TIME]
Why the space used is too high for smaller table even though row count is same as its source table?
Best Answer
Using bigger data types
These columns in
dbo.Table1
arevarchar(50)
:[ID]
[C_ID]
In
dbo.Table2
, they arenvarchar(32)
andnvarchar(64)
.The added "n" makes each character take up twice as much space as the
varchar
version. It's possible the same thing is going on with the FK column andP_COUNT
, but I don't see those defined on the original table, so I can't confirm.There are a couple of
datetime2(7)
columns changed todatetime
, but these have the same storage cost, so that shouldn't make a difference.If you get your data types back to what they were in the original table, you should see a pretty big reduction in space used. Note that you will need to
REBUILD
the clustered index (or just drop and reload the table) after making the data types changes.Fragmentation?
It's possible that fragmentation is contributing to the additional size of
dbo.Table2
. If the size is still significantly different after fixing the data types and recreating the table, you could try loading the table withORDER BY [ID] OPTION (MAXDOP 1)
to limit fragmentation in the target table.