SQLite Internals – Records

database-internalsrecordsqlitestorage

Hey I'm trying to wrap my head around SQLite data storage, specifically how it is storing Records. I've found a book The Definitive guide to SQLlite, where the author explains internal record format (Figure 9-5, page 351):

Given table:

sqlite> SELECT * FROM episodes ORDER BY id LIMIT 1;
id   season  name
---  ------  --------------------
0    1       Good News Bad News

Its internal record format is:

| 04 | 01 | 01 | 49 |   | 00 | 01 | Good News Bad News |

"The header is 4 bytes long. The header size reflects this and itself is encoded as a single byte. The first type, corresponding to the id field, is a 1-byte signed integer. The second type, corresponding to the season field, is as well. The name type entry is an odd number, meaning it is a text value. Its size is therefore given by (49-13)/2=18 bytes."

Specifically I'm curious about TEXT attribute, in the example above we have a string of length 18 characters. And the rule for TEXT in SQLite is as follows:

Type Value     Meaning   Length of Data
----------     -------   --------------------
N>13 and odd   TEXT      (N-13)/2

What hapenns though when the string is longer ? It'll get out of range of that one byte.

Best Answer

Either the book authors describe some really ancient version of SQLite, or they make a mistake when they say "[t]he header size [...] is encoded as a single byte."

Documentation that "describes and defines the on-disk database file format used by all releases of SQLite since version 3.0.0 (2004-06-18)" states that

The record format makes extensive use of the variable-length integer or varint representation of 64-bit signed integers[...]

A record contains a header and a body, in that order. The header begins with a single varint which determines the total number of bytes in the header. [...] Following the size varint are one or more additional varints, one per column.

while

A variable-length integer or "varint" is a static Huffman encoding of 64-bit twos-complement integers that uses less space for small positive values. A varint is between 1 and 9 bytes in length.

so

The serial type varints for large strings and BLOBs might extend to two or three byte varints

which certainly allows them to represent pretty long values.