Mysql – The benefits of ALGORITHM=INPLACE in ALTER TABLE queries

MySQL

For some reason I can't find any information on this.
A colleague insists me to use that on an Alter Table query but can't really say why.
It is a small table concerning the application's basic configurations with a very small amount of rows.

What are the benefits of ALGORITHM=INPLACE in a query that adds one MEDIUMINT NOT NULL type of column to a table which will have 10 rows at most?

The database is MySql.

Best Answer

A 10-row alter will be so fast that the "Algorithm" won't make a noticeable difference. You probably won't see a diff for 1000 rows.

Some history:

Originally ALTER had exactly one way of being performed under the hood. This made the code easy, but the performance not as good as it could be.

Create a new table like the existing table
make the modifications to that empty table
INSERT ... SELECT ... to copy all the data over
Juggle table names and drop the old data.

Beginning with 5.6, there was a concerted effort to optimize specific cases -- mostly to avoid the full table copy. This led to a variety of syntaxes and options. Confusing.

Fortunately, if you ask for an algorithm that is not applicable to the action in question, it will spit at you. Simply change the algorithm and try again.

I think, without proof, that ALTER TABLE will use the best Algorithm available for the use case in hand. And the Algorithm options are there in case something goes wrong and you need to force a different algorithm.

So why have ALGORITHM? I think it is like a lot of VARIABLES -- if something goes wrong, the end-user has the ability to "fix" the problem by turning off something. This does come into play in various evaluation optimzations. A visible case is FORCE INDEX.

Alas, the documentation of ALGORITHM is wimpy.

Related Solutions

Mysql – Why would splitting up a query make it faster, and can I/should I fix this

There is no direct way to tell MySQL to work on a single query "in chunks," since SQL is (primarily, at least) a declarative language ("platform, I need you to accomplish this end result") not a procedural one ("platform, I need you to process this data using these steps: for each...").

Memory might be your issue -- which storage engine do these tables use? ...but really, it seems like there are some more likely possibilities:

According to the EXPLAIN SELECT you posted, no index is being used to join the maps table (key is NULL), even though the mapExpireTime index is available. When the optimizer determines that the index doesn't have sufficient selectivity, it won't be used -- and the more records in the maps table your @mapExpireTime would not eliminate, the less likely the optimizer will be to select that index, and instead opt to a full table scan, which may explain why performance falls off. On shorter date ranges, does the index appear in key for the maps table in EXPLAIN? If so, that's the short answer to "why does it slow down?"

It's possible that ANALYZE TABLE maps; might build some better index stats and improve the query performance. Conventional wisdom is that the constant "shiftiness" of InnoDB index stats makes this assertion untrue, but I've seen too many times where the stats on an InnoDB table are in a state that biases the optimizer towards a phenomenally bad query plan and ANALYZE TABLE cleans it right up. (I think the reason the conventional wisdom on this is so widespread is the fact that SHOW TABLE STATUS actually triggers a random index stats dive, so the act of looking makes what you intended to look at different than what it was before you looked, but I digress).

Another possibility would be that you could add an index (ID,mapExpireTime) on the maps table, which seems at first glance quite redundant, but might be used as a covering index... which would be far better than the full table scan and join buffer that's getting used now.

Mysql – Why search by date field requires engine to touch all records

As @Razor commented, the number reported by EXPLAIN is only an estimate, and it can be off from your true row count by +/- 10% or even more.

MySQL prefers not to use an index at all if the portion of the table selected by the query conditions is too large. Think of the index at the back of a book. Why don't they index the word "the"? Because it would just list every page in the book. If the word occurs on too many pages in the book, it's actually easier to just read the book cover-to-cover instead of using an index.

Similarly, in MySQL if the optimizer estimates that the values you are searching for are found on a significantly high number of rows (in my experience "significant" is at least 16-18%) then MySQL skips the use of the index and just does a table-scan instead.

Sometimes this is the wrong choice, and you can give MySQL a hint that a table-scan is actually much more expensive than the preferred index choice:

EXPLAIN select * FROM playdays FORCE INDEX(realdate) where realdate>=Date('2010,01,01') ;

Be careful about using index hints too liberally, because you effectively hard-code your index choices into your queries by using them, and that may become the wrong strategy as your data changes over time. They should be used in exception cases, when you can demonstrate that the optimizer make the wrong choice.

Best Answer

Related Solutions

Mysql – Why would splitting up a query make it faster, and can I/should I fix this

Mysql – Why search by date field requires engine to touch all records

Related Question