Mysql – Large discrepancy between information_schema database size vs thesqldump database size

backupmariadbMySQL

We currently have a database on our server that seems to be excessively large. We run the following query to see all the databases sizes on our server –

SELECT Round(Sum(data_length + index_length) / 1024 / 1024, 1) 'DB Size in MB' FROM information_schema.tables where table_schema like 'some_db' GROUP BY table_schema;

It returns the following –

+---------------+ 
| DB Size in MB | 
+---------------+ 
| 23663.4       | 
+---------------+

However, we also take nightly backups. Looking at this nightly backup for the specific database, it is only 6.5 GB, rather than the 23 GB that the query is reporting. Every other database that we are checking for appears to be roughly the same.

I saw this – https://stackoverflow.com/questions/4709521/mysql-database-size, however we arent using phpmyadmin to takle the backups. The backups are done inside of a RHEL7 server running MariaDB, and are taken using mysqldump.

The command being run when taking the backup is –

mysqldump -uroot -p --single-transaction some_db > /tmp/somedb_backup.sql

Best Answer

A BIGINT takes 8 bytes (plus some overhead) on disk, but after going through mysqldump, a value of '1' will take only 2 bytes: INSERT ... VALUES(...,1,...). That's a 4:1 shrinkage. (Pro tip: 90+% of BIGINT columns don't need to be 'big'.)

On the other hand, a 1-byte TINYINT with 100 in it expands 1:4.

BLOBs tend to incur a 1:2 expansion, maybe more, due to escaping the bytes to encode in the dump.

INDEXes are not dumped.

"Free" space (and other fragmentation) is not dumped. This might account for 2:1 shrinkage in the typical dump.

But the dump spells out the column names at least twice -- once for the CREATE TABLE, and at least once for the INSERTs. On disk, the column names occur only once -- in the .frm file. (That is, not included in the SELECT you did.)

And there are many other cases of expansion/shrinkages.

Since you are seeing 23 : 6.5 shrinkage, I would guess that you have some over-sized integer columns and a lot of indexes. (Pro tip: It is folly to "index every column.")