MySQL – Table Size Query Does Not Match IBD File Size on Disk

disk-spaceMySQLmysql-5.7

I have a table with about 600 million rows. I ran the following commands and noticed

select table_schema "DB name (table_schema)",
    -> sum((data_length+index_length)/1024/1024/1024) AS "DB size in GB" from
    -> information_schema.tables group by table_schema;
+------------------------+-------------------+
| DB name (table_schema) | DB size in GB     |
+------------------------+-------------------+
| my_db                  | 4423.946937561025 |
+------------------------+-------------------+

The whole db is 4.4GB

The table with 600 million rows is:

+------------------------------------+-----------+
| Table                              | Size (MB) |
+------------------------------------+-----------+
| my_table                           |   3729084 |

So this table is 3.7TB in size.

I went to the file system and issued:

sudo du -hs * | sort -rh | head -5
11T my_table.ibd

My table is taking up 11TB in storage.

I ran the following command on the database:

SHOW VARIABLES LIKE 'innodb_file_per_table';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| innodb_file_per_table | ON    |
+-----------------------+-------+
1 row in set (0.01 sec)

So is it normal for an innodb file to be that large? If not, is there something that can be done to optimize its storage on disk?

I am running a master slave database:
sudo mysql –version
mysql Ver 14.14 Distrib 5.7.26, for Linux (x86_64) using EditLine wrapper

I am running this on an Ubuntu instance;
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.5 LTS
Release: 16.04
Codename: xenial

Be gentle with your answers as managing mysql is new to me. I usually just write programs to read from mysql and haul the data to something like Elasticsearch or HBase for big data analytics.

Thank you…Amro

Best Answer

Of course, it never does.

In your case, you have about 66.67% fragmentation

I have wickedly good script you can run to calculate the fragmentation based on the difference between the .ibd filesize and the (data_length+index_length) of the any table:

MYSQL_AUTH="-h... -u... -p..."

DATADIR=`mysql -ANe"SELECT @@global.datadir"`

TAG_LIST_FILE=/tmp/list_of_tables.txt
TAG_LIST_ERRS=/tmp/list_of_tables.err
DAT_LIST_FILE=/tmp/list_of_databases.txt
FULL_RPT_FILE=Full_FragReport.txt

MB=${1}

if [ "${MB}" == "" ] ; then exit ; fi

NUM_OK=0
if [ "${MB}" ==  "128" ] ; then NUM_OK=1 ; fi
if [ "${MB}" ==  "256" ] ; then NUM_OK=1 ; fi
if [ "${MB}" ==  "512" ] ; then NUM_OK=1 ; fi
if [ "${MB}" == "1024" ] ; then NUM_OK=1 ; fi
if [ "${MB}" == "2048" ] ; then NUM_OK=1 ; fi
if [ ${NUM_OK} -eq 0 ] ; then exit ; fi

MINIMUM_SIZE="(1024 * 1024 * ${MB})"
(( MIN_SIZE = MB * 1048576 ))

SQL="SELECT CONCAT(table_schema,':',table_name,':',data_length+index_length,':',data_free) FROM information_schema.tables"
SQL="${SQL} WHERE table_schema NOT IN ('information_schema','performance_schema','mysql') AND engine IN ('InnoDB','MyISAM')"
SQL="${SQL} AND data_length+index_length > ${MIN_SIZE}"
mysql ${MYSQL_AUTH} -ANe"${SQL}" >${TAG_LIST_FILE} 2>${TAG_LIST_ERRS}
TAG_FILESIZE=`ls -l ${TAG_LIST_FILE} | awk '{print $5}'`
ERR_FILESIZE=`ls -l ${TAG_LIST_ERRS} | awk '{print $5}'`
if [ ${ERR_FILESIZE} -gt 0 ]
then
        echo "Errors"
        echo "======"
        cat ${ERR_FILESIZE}
        echo
        exit
fi
if [ ${TAG_FILESIZE} -eq 0 ]
then
        echo "No Databases with Tables > ${MB}M (${MIN_SIZE})"
        exit
fi

SPC=""
TAGS=""
for TAG in `cat ${TAG_LIST_FILE}`
do
        NEWTAG=${TAG}
        DB=`echo ${TAG} | sed 's/:/ /g' | awk '{print $1}'`
        TB=`echo ${TAG} | sed 's/:/ /g' | awk '{print $2}'`
        IS=`echo ${TAG} | sed 's/:/ /g' | awk '{print $3}'`
        FR=`echo ${TAG} | sed 's/:/ /g' | awk '{print $4}'`
        IBD=${DATADIR}/${DB}/${TB}.ibd
        MYD=${DATADIR}/${DB}/${TB}.MYD
        MYI=${DATADIR}/${DB}/${TB}.MYI
        if [ -f ${IBD} ]
        then
                (( N1 = IS + FR ))
                N2=`ls -l ${IBD} | awk '{print $5}'`
                if [ ${N1} -lt ${N2} ]
                then
                        (( DF = N2 - N1 ))
                        NEWTAG="${DB}:${TB}:${IS}:${DF}"
                fi
        else
                ND=`ls -l ${MYD} | awk '{print $5}'`
                NI=`ls -l ${MYI} | awk '{print $5}'`
                (( N1 = IS + FR ))
                (( N2 = ND + NI ))
                if [ ${N1} -lt ${N2} ]
                then
                        (( DF = N2 - N1 ))
                        NEWTAG="${DB}:${TB}:${IS}:${DF}"
                fi
        fi
        TAGS="${TAGS}${SPC}${NEWTAG}"
        SPC=" "
done

# Collect List of DBs
DBLIST=""
SPC=""
for DB in `cat ${TAG_LIST_FILE} | sed 's/:/ /' | awk '{print $1}' | sort -u`
do
        DBLIST="${DBLIST}${SPC}${DB}"
        SPC=" "
done

for D2 in ${DBLIST}
do
        SUM=0
        SUM_DF=0
        SQL=""
        SELECT="SELECT"
        for TAG in ${TAGS}
        do
                DB=`echo ${TAG} | sed 's/:/ /g' | awk '{print $1}'`
                if [ "${DB}" == "${D2}" ]
                then
                        TB=`echo ${TAG} | sed 's/:/ /g' | awk '{print $2}'`
                        IS=`echo ${TAG} | sed 's/:/ /g' | awk '{print $3}'`
                        FR=`echo ${TAG} | sed 's/:/ /g' | awk '{print $4}'`
                        (( OS = IS + FR ))
                        SQL="${SQL} ${SELECT} '${TB}' TableName,LPAD(FORMAT(${OS},0),18,' ') FileSize,LPAD(FORMAT(${OS}/POWER(1024,3),2),10,' ') FileSizeGB,LPAD(FORMAT(${IS},0),18,' ') TableSize,LPAD(FORMAT(${IS}/POWER(1024,3),2),11,' ') TableSizeGB,LPAD(FORMAT(${FR},0),18,' ') Difference,LPAD(FORMAT(${FR}/POWER(1024,3),2),12,' ') DifferenceGB,LPAD(FORMAT(${FR}*100.0/${OS},2),7,' ') FragPct"
                        SELECT="UNION SELECT"
                        (( SUM    += OS ))
                        (( SUM_DF += FR ))
                fi
        done
        FRAG_RPT=FragReport_${D2}.txt
        DT=`date +"%Y-%m-%d %H:%M:%S"`
        echo "Fragmentation for Database ${D2} (${DT})" > ${FRAG_RPT}
        echo "Minimum Table Size (${MB} MB = ${MIN_SIZE})" >> ${FRAG_RPT}
        echo >> ${FRAG_RPT}
        BIGSQL="SELECT * FROM (${SQL}) A ORDER BY Difference DESC;"
        BIGSQL="${BIGSQL} SELECT FORMAT(${SUM}/POWER(1024,3),2) DiskSpaceUsed,FORMAT(${SUM_DF}/POWER(1024,3),2) SumDiff;"
        mysql ${MYSQL_AUTH} --table -Ae"${BIGSQL}" 2>&1 >> ${FRAG_RPT}
done

rm -f ${FULL_RPT_FILE}
for TXT in `ls FragReport*.txt`
do
        if [ -f ${FULL_RPT_FILE} ]
        then
                echo -e "\n================================================================================\n" >> ${FULL_RPT_FILE}
        fi
        cat ${TXT} >> ${FULL_RPT_FILE}
done
less ${FULL_RPT_FILE}

You will have too run this script as the OS root user or another OS that has rights to read the datadir

GIVE IT A TRY !!!

I wrote How to monitor mysql percentage disk utilization? like 6 years ago based on the same principles