Index fragmentation is much overrated. Do not worry about it.
Two adjacent, somewhat-empty, blocks are merged together by InnoDB as the natural processing.
Random actions on a BTree cause it to naturally gravitate toward an average of 69% full. Sure, this is not 100%, but the overhead of "fixing" it is not worth it.
SHOW TABLE STATUS
gives you some metrics, but they are flawed -- "Data_free" includes certain "free" space, but not other "free" space.
There is unused space in each block; free 16KB blocks; free "extents" (nMB chunks); MVCC rows waiting to be reaped; non-leaf nodes have their own fragmentation; etc.
Percona and Oracle have different ways of looking at how big (number of blocks) an index is. I find neither of them to be useful because of the limited definition of "free". It seems that blocks (16KB each) are allocated in chunks (several MB), thereby leading one to believe that there is all sorts of fragmentation. In reality, it is usually just most of one of these multi-MB chunks. And OPTIMIZE TABLE
does not necessarily recoup any of the space.
If SQL Server is using BTrees, then it is lying to say that there is "no fragmentation". Think of what happens on a "block split". Or think of the overhead of continually defragmenting. Either way you lose.
Further note that a table and an index are essentially identical structures:
- B+Tree, based on some index
- The "data" is based on the PRIMARY KEY; each secondary index is a B+Tree based on its index.
- The leaf node of the "data" contains all the columns of the table.
- The leaf node of a secondary index contains the columns of that secondary index, plus the columns of the PRIMARY KEY.
If you have innodb_file_per_table = ON
, you can clearly see the shrinkage (if any) after OPTIMIZE TABLE by looking at the .ibd
file's size. For OFF
, the info is buried in ibdata1
, but SHOW TABLE STATUS
may be reasonably accurate since all "free" space belongs to every table. Well, except for the pre-allocated chunks.
You may notice that a freshly optimized file-per-table table has exactly 4M, 5M, 6M, or 7M of Data_free. Again, this is the pre-allocation, and the failure to give you the minute details.
I have worked with InnoDB for over a decade; I have worked with thousands of different tables, large and small. I say that only one table in a thousand really needs OPTIMIZE TABLE
. Using it on other tables is a waste.
105 columns is a lot, but perhaps not too many.
Do you have 55 indexes on one table? That is bad. That is 55 updates per INSERT
. Let's discuss that further. Keep in mind that INDEX(a)
is useless if you also have INDEX(a,b)
. And INDEX(flag)
is useless because of low cardinality. (But INDEX(flag, foo)
may be useful.)
Q1: There is no good way to check for all forms of fragmentation in either the data or the secondary indexes.
Q2, Q3: OPTIMIZE TABLE
rebuilds the table by CREATEing
a new table and INSERTing
all the rows, then RENAMEing
and DROPping
. The re-inserting of the data in PK order assures that the data is well-defragmented. The indexes are another matter.
Q4: You could DROP
and reCREATE
each index to clean it up. But this is an extremely slow process. 5.6 has some speedups, but I don't know if they help with defragmentation.
It is also possible to ALTER TABLE ... DISABLE KEYS
, then ENABLE
them. This may to a more efficient rebuild of all the secondary indexes at once.
Best Answer
You're not alone on seeing sync latency with AGs when doing index rebuilds. From Sean Gallardy's blog post SQL Server Index Maintenance – You’re Doing It Wrong (emphasis mine):
Note that an important caveat mentioned in the comments of that post is that fragmentation can affect the performance of SQL Server's readahead mechanism, which is used heavily during large table scans (ETL loads, big reporting queries).
I honestly like your suggestion of disabling maintenance and seeing what happens. You could disable it on just the very large tables, and see if there is any negative impact vs the gains you get in terms of shorter maintenance windows.
To comment on this specific statement in your question:
FYI, rebuilding an index will result in statistics being updated for that index with
FULLSCAN
. So make sure you're not updating stats twice on these tables that are getting rebuilt.You should also take a look at Erik Darling's blog post Because Your Index Maintenance Script Is Measuring The Wrong Thing. You could check
avg_page_space_used_in_percent
to determine if there are indexes where you can reclaim a lot of space (on disk and in memory) by rebuilding. This would probably be the biggest "win" from rebuilds.On a related note, another of Erik's posts (What Metrics Does Rebuilding Indexes Improve?) points out some specific situations that might warrant an ad-hoc rebuild:
Unfortunately, there isn't an easy button here. Hopefully the links help inform your decision process though.