Mysql – Large discrepancy between information_schema database size vs thesqldump database size

backupmariadbMySQL

We currently have a database on our server that seems to be excessively large. We run the following query to see all the databases sizes on our server –

SELECT Round(Sum(data_length + index_length) / 1024 / 1024, 1) 'DB Size in MB' FROM information_schema.tables where table_schema like 'some_db' GROUP BY table_schema;

It returns the following –

+---------------+ 
| DB Size in MB | 
+---------------+ 
| 23663.4       | 
+---------------+

However, we also take nightly backups. Looking at this nightly backup for the specific database, it is only 6.5 GB, rather than the 23 GB that the query is reporting. Every other database that we are checking for appears to be roughly the same.

I saw this – https://stackoverflow.com/questions/4709521/mysql-database-size, however we arent using phpmyadmin to takle the backups. The backups are done inside of a RHEL7 server running MariaDB, and are taken using mysqldump.

The command being run when taking the backup is –

mysqldump -uroot -p --single-transaction some_db > /tmp/somedb_backup.sql

Best Answer

A BIGINT takes 8 bytes (plus some overhead) on disk, but after going through mysqldump, a value of '1' will take only 2 bytes: INSERT ... VALUES(...,1,...). That's a 4:1 shrinkage. (Pro tip: 90+% of BIGINT columns don't need to be 'big'.)

On the other hand, a 1-byte TINYINT with 100 in it expands 1:4.

BLOBs tend to incur a 1:2 expansion, maybe more, due to escaping the bytes to encode in the dump.

INDEXes are not dumped.

"Free" space (and other fragmentation) is not dumped. This might account for 2:1 shrinkage in the typical dump.

But the dump spells out the column names at least twice -- once for the CREATE TABLE, and at least once for the INSERTs. On disk, the column names occur only once -- in the .frm file. (That is, not included in the SELECT you did.)

And there are many other cases of expansion/shrinkages.

Since you are seeing 23 : 6.5 shrinkage, I would guess that you have some over-sized integer columns and a lot of indexes. (Pro tip: It is folly to "index every column.")

Related Solutions

MySQL – How to Optimize mysqldump of a Large Database

The main bottleneck in the dump like this is drive I/O. You are reading a load of data and writing it again. You can speed this up in a number of ways:

Make sure your output is going to a different drive(s) than the one(s) the database files are stored on - this will make a massive difference with spinning disks as the drive heads will not be constantly flicking between the location being read from and the location being written to.
The output of mysqldump will be very compressible, so if you can not separate the output from the input as mentioned above pipe the output through gzip or similar. This will reduce the amount of writing being done (so reduce the overall IO load, and the amount of head movement) at the expense of some CPU time (which you may have a lot of spare at these times anyway).
Also, (as well or instead of compression) pass the output through a pipe utility (like pv) that supports large write buffers to group blocks written to the drives together more, again to reduce the effect of head-movement latency - this will make quite a difference if using the --quick option to reduce the RAM impact of backing up large tables).
Only run your backup process when IO load is otherwise low.

You may be fixing the wrong issue though: it might be easier to address the connection drops instead (though reducing the I/O load imposed by your backups will help reduce the effect you have on other users so is worth trying anyway). Could you run your manual backups through screen (or similar tools like tmux)? That way if your connection to the server drops you can just reconnect and reattach to the screen session without any processes getting interrupted.

If you are sending the data directly over the connection (i.e. you are running mysqldump on your local machine against a remote database, so the dump appears locally) you might be better off running the dump on the server first, compressing as needed, then transferring the data over the network using a tool (such as rsync) which supports partial transfers so you can resume the transfer (instead of restarting) if a connection drop interrupts it.

As part of your "reducing the size of the overall database to resolve this issue" I would guess that a large chunk of your data does not change. You might be able to move a large chunk of the 1.2Gb from that main table off into another and remove that from those that are copied by the mysqldump call. You don't need to backup this data every time if it never changes. Splitting data between tables and databases this way is usually referred to as data partitioning and can also allow you to spread the data and I/O load over multiple drives. High-end database have built in support for automatic partitioning, though in mysql you will probably have to do it manually and alter your data access layer to account for it.

Straying off-topic for this site (so you should probably nip over to ServerFault or SuperUser to ask if you need more detail): If you seem to be losing connections due to inactivity, check the options in your SSH server and SSH client to make sure keep-alive packets are enabled and being sent often enough. If seeing drops even if the connection is active you could also try using OpenVPN or similar to wrap the connection - it should handle a short drop, even a complete drop if your entire connection is down for a few seconds, such that the SSH client and server don't notice.

Mysql – Different MySQL Datafile Sizes After Restoration

This makes all the sense in the world to me.

InnoDB creates data pages and index pages that are 16K each. If rows of data are being inserted, updated, deleted, committed, and rolled back, you are going to have FRAGMENTATION !!!

There are two cases where you can have internal fragmentation:

A single row could be written in multiple data pages because certain column values make a row too big to fit in the data page.
Having a TEXT column with 32K of data in it.

In those two cases, a single row spanning multiple data pages would have to be chained like a linked list. The internally generated list of data pages would always have to be navigated when the row is read.

Giving credit where credit is due, PostgreSQL implemented a very brilliant mechanism called TOAST (The Oversized-Attribute Storage Technique) to keep oversized data outside of tables to stem the tide of this kind of internal fragmentation.

Have used mysqldump to make a file with CREATE TABLE statements, followed by lots of INSERTs, you get a fresh table with no unused space along with contiguous data and index pages when loading the mysqldump into a new server.

For my explanantions, let's assume you have an InnoDB table in the CUSTODIA database called userinfo

If you would like to compress a table, you have three(3) options

Option 1

OPTIMIZE TABLE CUSTODIA.userinfo;

Option 2

ALTER TABLE CUSTODIA.userinfo ENGINE=InnoDB;

Option 3

CREATE TABLE CUSTODIA.userinfo2 LIKE CUSTODIA.userinfo;
INSERT INTO CUSTODIA.userinfo2 SELECT * FROM CUSTODIA.userinfo;
DROP TABLE CUSTODIA.userinfo;
ALTER TABLE CUSTODIA.userinfo2 RENAME CUSTODIA.userinfo;

CAVEAT : Option 3 is no good on a table with constraints. Option 3 is perfect for MyISAM.

Now for your questions:

Question 1. Why is there this difference between original and restored database size?

As explained above

Question 2. Is it safe to assume that restored database is OK, although this difference in size?

If you want to make absolutely sure that the data on both servers are identical, simply run this command on both DB servers:

CHECKSUM TABLE CUSTODIA.userinfo;

Hopefully, the checksum value is identical for the same table on both servers. If you have dozens, even hundreds, of tables, you may have to script it.

Question 3 : How does MySQL calculate data_length? Is it an estimate?

You are using the correct method in summing up the data_length and index_length. Based on my explanation of fragmentation, it is an estimate.

Question 4. Can I safely reduce production's ibdata file size to 3.6GiB with no down-time?

GOOD NEWS !!! You absolutely can compress it !!! In fact, how would like to compress it to a fraction of that number ??? Follow these two links because I addressed this issue in StackOverflow and ServerFault.

https://stackoverflow.com/questions/3927690/howto-clean-a-mysql-innodb-storage-engine/4056261#4056261

https://serverfault.com/questions/230551/mysql-innodb-innodb-file-per-table-cons/231400#231400

BAD NEWS !!! Sorry, but you will have a 3-5 minute window of downtime for rebuilding ib_logfile0 and ib_logfile1 as well shrinking ibdata1 once and for all. It's well worth it since it will be a one-time operation.

Best Answer

Related Solutions

MySQL – How to Optimize mysqldump of a Large Database

Mysql – Different MySQL Datafile Sizes After Restoration

Related Question