MySQL Storage – Do varchar/text/longtext Columns Occupy Full Size in Filesystem?

MySQLvarchar

Do varchar/text/longtext columns in MySQL occupy full length size in filesystem even for incomplete cells?

Referring to this blog article for example I have the following columns and i have put in a given string in it.

CHAR (72) - can store upto 72 characters - i put in a string "abcd"
VARCHAR (72) - can store upto 72 characters - i put in a string "abcd"
TINYTEXT - can store upto 255 characters - i put in a string "abcd"
TEXT - can store upto 65,535 characters - i put in a string "abcd"
LONGTEXT - can store upto 4,294,967,295 characters - i put in a string "abcd"

How much actual physical space will be occupied by each of these columns for a row? The full size upto which each column is capable of storing? Or only as much as required for storing "abcd"?

Best Answer

'abcd' in CHAR(72) CHARACTER SET ascii occupies 72 bytes on disk.
'abcd' in CHAR(72) CHARACTER SET utf8 occupies 3*72 bytes on disk.
'abcd' in CHAR(72) CHARACTER SET utf8mb4 occupies 4*72 bytes on disk.
'abcd' in VARCHAR(72) occupies 1+4 bytes on disk.
'abcd' in TINYTEXT occupies 1+4 bytes on disk.
'abcd' in TEXT occupies 2+4 bytes on disk.
'abcd' in LONGTEXT occupies 4+4 bytes on disk.

Update: In some versions, InnoDB will store only 4 bytes for the CHAR cases.

Notes:
The 1,2,4 is for length; VARCHAR could be 2 in some situations.
The CHARACTER SET is important in all the cases, but does not impact the space occupied by 'abcd' except for CHAR.

@akuzminsky -- You are wrong about the *3. CHAR(N) and VARCHAR(N) can hold up to N characters in the declared CHARACTER SET. That will become up to 3*N bytes for CHAR or 1+3*N for VARCHAR.

TEXT is limited to 65535 bytes (plus 2-byte length).

There is overhead on top of the lengths described above. In InnoDB, there are record flags, transaction ids, blocking factors, pointers to overflow areas, etc, etc. So, if you use this kind of arithmetic, it will underestimate the disk space used.

MyISAM is more frugal; it has at least 1 byte of overhead per record; in some cases, only 1.