How large is SQLites minimum data storage overhead and what generates them

blobdatabase-internalsdatabase-sizemetadatasqlite

Motivation:

In my company there exists the (probably strange) idea to store pure time series data in a normal SQL table (instead of using BLOBs or just binary files which would probably be a better idea). I have to prove this is a misconception (at least for our embedded device where some kind of meta data overhead is not possible, since data storage is limited).

What I tried:

I'm using SQLite. Here is the test table's DDL:

CREATE TABLE DataTable(signal_0 FLOAT, 
                       signal_1 FLOAT, 
                       signal_2 FLOAT, 
                       signal_3 FLOAT);

I believe there is no PK overhead and no index overhead. However, there is the B-Tree "overhead" and some (probably negligible) metadata "overhead".
I made some tests with the database schema above and it seemed the overhead is about 40 % of the user data (e.g. 200 MB of pure data results in 276 MB file size). I tried different amounts of data; insert time and data overhead seemed to be relatively constant.

40% seems very high – but I have nothing else than the DLL which creates the table, and a simple insert in a for loop creating the data. select count(*) from table gives me 6253010. This means 6253010*4*8 Bytes = 200 MB user data.

I didn't examine speed yet, since if data storage overhead is a show stopper my "proof of misconception" is already successful.

My question:

Is 40% (in my example table structure) the inevitable overhead or is it possible to considerably reduce it to less than 5% overhead? What components actually cause this overhead?

I'm afraid this is already answered in sqlite fileformat doc, however I couldn't discover the relevant information there.

Output (extract) of sqlite3_analyzer.exe with 4 rows of float64 entries

*** All tables ****************************************************************

Percentage of total database...................... 100.0%
Number of entries................................. 6253011
Bytes of storage consumed......................... 276189184
Bytes of payload.................................. 231361512   83.8%
Bytes of metadata................................. 42847878    15.5%
Average payload per entry......................... 37.00
Average unused bytes per entry.................... 0.32
Average metadata per entry........................ 6.85
Average fanout.................................... 368.00
Maximum payload per entry......................... 142
Entries that use overflow......................... 0            0.0%
Index pages used.................................. 183
Primary pages used................................ 67246
Overflow pages used............................... 0
Total pages used.................................. 67429
Unused bytes on index pages....................... 97172       13.0%
Unused bytes on primary pages..................... 1882622      0.68%
Unused bytes on overflow pages.................... 0
Unused bytes on all pages......................... 1979794      0.72%

I'm really confused why "Bytes of payload" are 231,361,370 and not 200,000,000 (just rechecked code and count(*), which both show 6253010 rows with 4*8 Bytes each). But since the statistic definitely say the metadata is (at least) above 15 % it's definitely too high anyway.

Output (extract) of sqlite3_analyzer.exe with 12 rows of float64 entries

*** All tables ****************************************************************

Percentage of total database...................... 100.0%
Number of entries................................. 2084263
Bytes of storage consumed......................... 244547584
Bytes of payload.................................. 229269045   93.8%
Bytes of metadata................................. 13502878     5.5%
Average payload per entry......................... 110.00
Average unused bytes per entry.................... 0.85
Average metadata per entry........................ 6.48
Average fanout.................................... 395.00
Maximum payload per entry......................... 346
Entries that use overflow......................... 0            0.0%
Index pages used.................................. 151
Primary pages used................................ 59553
Overflow pages used............................... 0
Total pages used.................................. 59704
Unused bytes on index pages....................... 81205       13.1%
Unused bytes on primary pages..................... 1694456      0.69%
Unused bytes on overflow pages.................... 0
Unused bytes on all pages......................... 1775661      0.73%

Summary

What I managed to find out (thanks to Max Vernon) the data overhead mainly consists of:

the indexing structure SQLite B-Tree (refering to sqlite3_analyzer.exe this is about 6.5 Bytes per Row)
A 64bit indexing key per row which seems cannot be avoided.

Unfortunately I wasn't able to verify this quantitatively by some tests. For example it is still not clear why payload is higher than 200 MB. Even including the auto generated index into the payload doesn't yield the results from sqlite3_analyzer.exe for the payload value. So my "model of overhead generating components" is probably slightly wrong or incomplete. Hence there maybe is still room for improving the data storage soze – I am not sure about that.

Anyway, I can say that without optimizing the way data is stored in SQLite the metadata overhead is roughly about 40% for rows with a payload of about 32 bytes and 25% for rows with a payload of 96 bytes (at least if the payload consists of floats). And (which is probably clear) the metadata overhead is mostly generated per row in a database table and not per cell.

Best Answer

Thanks for posting the output from sqlite3_analyzer. The pertinent bits are:

Table DATAMONOTABLE 

Percentage of total database......................  99.999%
Number of entries................................. 6253010
Bytes of storage consumed......................... 276185088
Bytes of payload.................................. 231361370   83.8%
Bytes of metadata................................. 42847765    15.5%
B-tree depth...................................... 3
Average payload per entry......................... 37.00
Average unused bytes per entry.................... 0.32
Average metadata per entry........................ 6.85

The report says metadata is 15.5% of the total storage consumed for the table in question. Every row of data consists of 4 x floating-point values which consume 8 bytes each for a total of 32 bytes. Doing the math, this indicates SQLite is consuming 4.96 bytes per row for metadata. According to section 2.3 of SQLite File Format document, each ordinary table is a "b-tree table":

2.3. Representation Of SQL Tables

Each ordinary SQL table in the database schema is represented on-disk by a table b-tree. Each entry in the table b-tree corresponds to a row of the SQL table. The rowid of the SQL table is the 64-bit signed integer key for each entry in the table b-tree.

The content of each SQL table row is stored in the database file by first combining the values in the various columns into a byte array in the record format, then storing that byte array as the payload in an entry in the table b-tree. The order of values in the record is the same as the order of columns in the SQL table definition. When an SQL table includes an INTEGER PRIMARY KEY column (which aliases the rowid) then that column appears in the record as a NULL value. SQLite will always use the table b-tree key rather than the NULL value when referencing the INTEGER PRIMARY KEY column.

As mentioned above, each row has a rowid that is a 64-bit signed integer, this is pretty clearly 4 bytes per row, leaving only 0.96 bytes per row for other overhead including page headers, etc. Larger rows would consume less overhead per-row since the rowid is always a 64-bit integer.

Having said all that, size-on-disk is only one aspect to consider when storing data.

Implementing a DIY storage system is fraught with problems that have already been solved rather extensively by various database management systems. For instance, an ACID-compliant DBMS, such as SQL Server, PostgreSQL, or SQLite, will allow for guaranteed atomicity, consistency, isolation, and durability.

Locating a specific row will be extremely fast using a good DBMS, and aggregating data will be easy, fast, and extremely versatile.

I most often work with SQL Server, so I decided to create a quick mock-up of your data to see what the space requirements would be.

First, I'll create a new, blank database for testing.

USE master;
GO
IF EXISTS (SELECT 1 FROM sys.databases d WHERE d.name = 'SizeCalc')
BEGIN
    ALTER DATABASE SizeCalc SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE SizeCalc;
END
CREATE DATABASE SizeCalc;
GO

Here, I'll create a table with 4 floating point values per row:

USE SizeCalc;
CREATE TABLE dbo.Floats
(
      x1 float NULL
    , x2 float NULL
    , x3 float NULL
    , x4 float NULL
);

The table will have a clustered index, since that is the most commonly used data structure in SQL Server. Note this index is not defined as a UNIQUE index, so it will have a uniqueifier added automatically that will add 4 bytes-per-row.

CREATE CLUSTERED INDEX cx_float
ON dbo.Floats(x1);

Here I'm inserting exactly 6,253,011 rows into the table, as per your question. Every value inserted will be a cryptographically-generated random number.

INSERT INTO dbo.Floats (x1, x2, x3, x4)
SELECT TOP(6253011) 
      CONVERT(bigint, CRYPT_GEN_RANDOM(8)) / CONVERT(float, CONVERT(bigint, CRYPT_GEN_RANDOM(2)) + 0.000001)
    , CONVERT(bigint, CRYPT_GEN_RANDOM(8)) / CONVERT(float, CONVERT(bigint, CRYPT_GEN_RANDOM(2)) + 0.000003)
    , CONVERT(bigint, CRYPT_GEN_RANDOM(8)) / CONVERT(float, CONVERT(bigint, CRYPT_GEN_RANDOM(2)) + 0.000005)
    , CONVERT(bigint, CRYPT_GEN_RANDOM(8)) / CONVERT(float, CONVERT(bigint, CRYPT_GEN_RANDOM(2)) + 0.000007)
FROM sys.syscolumns sc1
    CROSS JOIN sys.syscolumns sc2
    CROSS JOIN sys.syscolumns sc3
GO

Show the row-count:

SELECT COUNT(1) 
FROM dbo.Floats;

Results:

╔══════════════════╗
║ (No column name) ║
╠══════════════════╣
║          6253011 ║
╚══════════════════╝

Show the space used by the table:

DECLARE @Schema sysname;
DECLARE @Table sysname;
DECLARE @DSName sysname;
SET @Schema = 'dbo';
SET @Table = 'Floats';
SET @DSName = NULL;  --we don't care which DataSpace the table resides in.

SELECT DataSpace = ds.name
    , ObjectName = QUOTENAME(s.name) + '.' + QUOTENAME(o.name)
    , IndexName = i.name
    , IndexType = i.type_desc
    , TotalMB = CONVERT(INT, total_pages * 8192E0 / 1048576)
    , UsedMB = CONVERT(INT, used_pages * 8192E0 / 1048576)
    , DataMB = CONVERT(INT, data_pages * 8192E0 / 1048576)
    , [rows]
FROM sys.allocation_units au
    INNER JOIN sys.data_spaces ds ON au.data_space_id = ds.data_space_id
    INNER JOIN sys.partitions p ON (au.container_id = p.hobt_id AND (au.type = 1 OR au.type = 3)) OR (au.container_id = p.partition_id AND au.type = 2)
    INNER JOIN sys.indexes i ON p.index_id = i.index_id AND p.object_id = i.object_id
    INNER JOIN sys.objects o ON p.object_id = o.object_id
    INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
WHERE (ds.name = @DSName OR @DSName IS NULL)
    AND (s.name = @Schema OR @Schema IS NULL)
    AND (o.name = @Table OR @Table IS NULL)
ORDER BY ds.name, o.name, i.name;

╔═══════════╦════════════════╦═══════════╦═══════════╦═════════╦════════╦════════╦═════════╗
║ DataSpace ║   ObjectName   ║ IndexName ║ IndexType ║ TotalMB ║ UsedMB ║ DataMB ║  rows   ║
╠═══════════╬════════════════╬═══════════╬═══════════╬═════════╬════════╬════════╬═════════╣
║ PRIMARY   ║ [dbo].[Floats] ║ cx_float  ║ CLUSTERED ║     249 ║    249 ║    247 ║ 6253011 ║
╚═══════════╩════════════════╩═══════════╩═══════════╩═════════╩════════╩════════╩═════════╝

So, the table consumes ~250MB, slightly less than your table in SQLite. Now, if we're so inclined, we might define the table with page compression, which for our randomly-generated values will likely not make much difference. The table:

CREATE TABLE dbo.Floats
(
      x1 float NULL
    , x2 float NULL
    , x3 float NULL
    , x4 float NULL
) WITH (DATA_COMPRESSION = PAGE);

The space consumed:

╔═══════════╦════════════════╦═══════════╦═══════════╦═════════╦════════╦════════╦═════════╗
║ DataSpace ║   ObjectName   ║ IndexName ║ IndexType ║ TotalMB ║ UsedMB ║ DataMB ║  rows   ║
╠═══════════╬════════════════╬═══════════╬═══════════╬═════════╬════════╬════════╬═════════╣
║ PRIMARY   ║ [dbo].[Floats] ║ cx_float  ║ CLUSTERED ║     237 ║    237 ║    236 ║ 6253011 ║
╚═══════════╩════════════════╩═══════════╩═══════════╩═════════╩════════╩════════╩═════════╝

As expected, that didn't make very much difference. Now, if we can make use of the clustered columnstore index feature available in modern versions of SQL Server, we might see better compression:

CREATE TABLE dbo.Floats
(
      x1 float NULL
    , x2 float NULL
    , x3 float NULL
    , x4 float NULL
);

CREATE CLUSTERED COLUMNSTORE INDEX cx_float
ON dbo.Floats;

The space-used results for SQL Server 2016:

╔═══════════╦════════════════╦═══════════╦═══════════════════════╦═════════╦════════╦════════╦═════════╗
║ DataSpace ║   ObjectName   ║ IndexName ║       IndexType       ║ TotalMB ║ UsedMB ║ DataMB ║  rows   ║
╠═══════════╬════════════════╬═══════════╬═══════════════════════╬═════════╬════════╬════════╬═════════╣
║ PRIMARY   ║ [dbo].[Floats] ║ cx_float  ║ CLUSTERED COLUMNSTORE ║       0 ║      0 ║      0 ║ 6253011 ║
║ PRIMARY   ║ [dbo].[Floats] ║ cx_float  ║ CLUSTERED COLUMNSTORE ║     195 ║    195 ║      0 ║ 6253011 ║
╚═══════════╩════════════════╩═══════════╩═══════════════════════╩═════════╩════════╩════════╩═════════╝

That's made quite a difference; we're now at 195 MB. With non-randomly generated data you'd see far greater benefit from both page-compression and clustered columnstore compression - the space used above should be considered worst-case scenarios.

Related Solutions

Postgresql – Proper storage size estimation technique for PostgreSQL

For point 1), you need to read the Storage Page Layout chapter of the documentation and in particular consider the HeapTupleHeaderData Layout table for the metadata at the row level.

The 4-bytes per-row OID is obsolete for user tables. PostgreSQL no longer have them by default since 8.1. This is now controlled by the default_with_oids config parameter or the WITH(OIDS) clause of CREATE TABLE.

When a significant part of the data is live (gets frequently updated), estimating the disk size is harder because UPDATEs are equivalent to INSERT new version followed by DELETE old version, followed hopefully by reuse of the unused space whenever possible. There is also the fillfactor storage option for tables that come into play here (see the storage parameters section in CREATE TABLE)

And there is bloat at the index level, too. For write-heavy tables, it's not uncommon for an index to be several times bigger than its optimal size.

You may need to learn a bit about VACUUM to figure out how much you may be affected by table and index bloat in your specific usage.

The size of the transaction logs entirely depends on the amount of writes to the database.

Sql-server – a Scalable Storage Mechanism for large TTL data collections

I have created a very simple demo of how partition switching might work for you:

USE tempdb
GO

SET NOCOUNT ON
GO

IF OBJECT_ID('dbo.largeTable') IS NOT NULL DROP TABLE dbo.largeTable
IF OBJECT_ID('dbo.largeTable1') IS NOT NULL DROP TABLE dbo.largeTable1
IF EXISTS ( SELECT * FROM sys.partition_schemes WHERE name = 'ps_date' ) DROP PARTITION SCHEME ps_date
IF EXISTS ( SELECT * FROM sys.partition_functions WHERE name = 'pf_date' ) DROP PARTITION FUNCTION pf_date
GO

CREATE PARTITION FUNCTION pf_date (DATE) AS RANGE RIGHT FOR VALUES ( '1 Jan 2013', '1 Feb 2013', '1 Mar 2013', '1 Apr 2013', '1 May 2013', '1 Jun 2013', '1 Jul 2013', '1 Aug 2013', '1 Sep 2013', '1 Oct 2013', '1 Nov 2013', '1 Dec 2013' );
GO

-- !!TODO don't use ALL TO PRIMARY, instead create individual files and filegroups
CREATE PARTITION SCHEME ps_date AS PARTITION pf_date ALL TO ( [PRIMARY] )
GO

IF OBJECT_ID('dbo.largeTable') IS NULL
CREATE TABLE dbo.largeTable 
    ( 
    rowId INT IDENTITY, 
    someData UNIQUEIDENTIFIER DEFAULT NEWID(), 
    dateAdded DATE DEFAULT GETDATE(), 
    addedBy VARCHAR(30) DEFAULT SUSER_NAME(), 
    ts ROWVERSION,

    CONSTRAINT pk PRIMARY KEY(dateAdded, rowId) 
    ) ON [ps_date](dateAdded)
GO


CREATE TABLE dbo.largeTable1
    ( 
    rowId INT IDENTITY, 
    someData UNIQUEIDENTIFIER DEFAULT NEWID(), 
    dateAdded DATE DEFAULT GETDATE(), 
    addedBy VARCHAR(30) DEFAULT SUSER_NAME(), 
    ts ROWVERSION,

    CONSTRAINT pk2 PRIMARY KEY(dateAdded, rowId) 
    ) ON [PRIMARY]
GO


-- Create some dummy data
INSERT INTO dbo.largeTable DEFAULT VALUES
GO 5

-- Multiply the data a bit
INSERT INTO dbo.largeTable ( someData, dateAdded, addedBy ) 
SELECT someData, DATEADD( month, -2, dateAdded ), addedBy
FROM dbo.largeTable
UNION ALL
SELECT someData, DATEADD( month, -1, dateAdded ), addedBy
FROM dbo.largeTable 
UNION ALL
SELECT someData, DATEADD( month, 1, dateAdded ), addedBy
FROM dbo.largeTable
GO


-- Have a look at the data
SELECT 'before' s, $PARTITION.pf_date( dateAdded ) p, dateAdded, COUNT(*) AS records
FROM dbo.largeTable
GROUP BY dateAdded
GO

-- Switch out oldest partition with data and truncate it
ALTER TABLE dbo.largeTable SWITCH PARTITION 9 TO dbo.largeTable1
GO

TRUNCATE TABLE dbo.largeTable1
GO

SELECT 'after' s, $PARTITION.pf_date( dateAdded ) p, dateAdded, COUNT(*) AS records
FROM dbo.largeTable
GROUP BY dateAdded
GO

-- Merge the range as no longer required
ALTER PARTITION FUNCTION pf_date() MERGE RANGE ( '1 Sep 2013' );
GO

TRUNCATE TABLE can be a minimally logged operation under certain conditions. Please consult the Data Loading Performance Guide for a fuller treatment on the topic. There is also a section on "Deleting All Rows from a Partition or Table".

Good luck!