Mysql – How performance schema in thesql 5.6 improve performance

information-schemaMySQLmysql-5.5mysql-5.6performance

Since MySQL 5.5 performance_schema database has been added likewise information_schema.
As we know that tables in information_schema contains statistical information like tables, plugins, partitions, processlist, status and global variables etc.
Just like, performance_schema contains a lot of tables.

But, how can we use these tables to improve performance of an
application based on MySQL 5.6?

Best Answer

Please note the difference between the information_schema and performance_schema databases

INFORMATION_SCHEMA

The information_schema database is an inventory of all objects within the MySQL instance

Such objects include:

databases
tables
columns
constraints
indexes (called statistics)
processlist
locks

I wrote a nice post about this 3 years ago : How is INFORMATION_SCHEMA implemented in MySQL?

PERFORMANCE_SCHEMA

The performance_schema database is an instrumentation platform that collects statistics about phases of a running query. Such info can give you hints to tune the query or the environment. You only need to enable it when debugging and troubleshooting queries in conjunction with adding indexes (may or may not be necessary) and adjusting buffer settings (again, may or may not be necessary).

I have not written much about performance_schema since I do not use it that often (Mentioned in my old post Is there any way to get the estimate cost of executing a query in MySQL?).

Derek Downey also wrote a quick post on it : What is the usage of performance_schema in MySQL 5.5?

There is a wealth of information on how to use the performance_schema I think you will enjoy

Related Solutions

Mysql – Improve `Update` performance (rows locking issue)

First, each time you UPDATE the status column, you are having to update the index as well (source). Evaluate your indexing to see if you really need the index on the status column. My guess is no, since it has an extremely low cardinality and MySQL probably won't use it anyway.

If you ignore me and think you do need it, follow the advice in the article to drop the index before your loop and re-add it after you're done.

Here are some other things you might do if that doesn't help:

You are taking all the columns from the data but only using number. Don't do a SELECT *, but instead a SELECT number. That won't help your writes, but it is a good performance practice. Only select the columns you're using.
Your number index isn't getting used at all. This means it is not unique enough to be useful for updating. (Slight tangent: how many rows does a single UPDATE affect?) I would drop it, or at least add it to process index.
It looks like process is unique enough for MySQL to whittle the amount of rows down to 16k, instead of 1 million. In light of this, I would add AND process=x to your update statement (I'm assuming you know process from the original SELECT statement):
```
-- FAILED--
UPDATE data SET status = 2, error='$error' WHERE process=X AND number = $data['number']

-- SUCCESS --
UPDATE data SET status = 1 WHERE process=X AND number = $data['number']
```

A hint about unnecessary indexes in InnoDB. InnoDB is using a hidden 'primary key' (since you don't have one defined) and is using that when it writes the indexes. So for each Index you're using, you add the size of the index + the size of the hidden primary key to the data file. If you're not using the index (or MySQL can't use it), you are wasting space and adding overhead each time you insert a new number (same for status, as discussed earlier)

Mysql – Partitioning MySQL for “expired” transactions to improve performance

If a lot of your queries have conditions similar to what you describe, e.g. range condition on the date column:

WHERE dateColumn >= (CURRENT_DATE() - INTERVAL 1 MONTH)

WHERE dateColumn >= '2012-01-01' 
  AND dateColumn < '2013-01-01'

it will be useful to define the primary key of the table as (dateColumn, tableAI), where tableAI is an auto incremented integer column.

For InnoDB tables, the above choice means that this index will be the clustered index of the table and any range (on the date) queries on the table will use it and will only scan the relevant part of the table.

If the HasExpired condition is more complex, e.g. some rows should be marked expired after a month, some others after a year, some after a few days, I would consider splitting the table(s) into 2 parts. One having only expired data and one having all data.

So, when some data expires, you'll simply delete them from the "not_expired" table. The data will already be on the other. When you search for not expired data, you'll be searching only one table (the smallest one).
You also have the benefit of adding different indexes on the tables (for example only the above suggestion for the CI in the "all-data" table and many indexes on the not-expired table, depending on your queries.) Without the split, you would need to have an HasExpired column and attach it to all your indexes (as first column), to have equally good indexes.

If the (not-expired) data is updated many times, it might be beneficial to have the split as "not-expired" and "expired" data. So you won't need to do any updates on the "expired" table. But when some data expires, you'll simply move them (instead of deleting) from the "not_expired" table to the "expired" one.