First, each time you UPDATE
the status column, you are having to update the index as well (source). Evaluate your indexing to see if you really need the index on the status column. My guess is no, since it has an extremely low cardinality and MySQL probably won't use it anyway.
If you ignore me and think you do need it, follow the advice in the article to drop the index before your loop and re-add it after you're done.
Here are some other things you might do if that doesn't help:
You are taking all the columns from the data
but only using number
. Don't do a SELECT *
, but instead a SELECT number
. That won't help your writes, but it is a good performance practice. Only select the columns you're using.
Your number
index isn't getting used at all. This means it is not unique enough to be useful for updating. (Slight tangent: how many rows does a single UPDATE
affect?) I would drop it, or at least add it to process
index.
It looks like process
is unique enough for MySQL to whittle the amount of rows down to 16k, instead of 1 million. In light of this, I would add AND process=x
to your update statement (I'm assuming you know process from the original SELECT
statement):
-- FAILED--
UPDATE data SET status = 2, error='$error' WHERE process=X AND number = $data['number']
-- SUCCESS --
UPDATE data SET status = 1 WHERE process=X AND number = $data['number']
A hint about unnecessary indexes in InnoDB. InnoDB is using a hidden 'primary key' (since you don't have one defined) and is using that when it writes the indexes. So for each Index you're using, you add the size of the index + the size of the hidden primary key to the data file. If you're not using the index (or MySQL can't use it), you are wasting space and adding overhead each time you insert a new number
(same for status
, as discussed earlier)
If a lot of your queries have conditions similar to what you describe, e.g. range condition on the date column:
WHERE dateColumn >= (CURRENT_DATE() - INTERVAL 1 MONTH)
WHERE dateColumn >= '2012-01-01'
AND dateColumn < '2013-01-01'
it will be useful to define the primary key of the table as (dateColumn, tableAI)
, where tableAI
is an auto incremented integer column.
For InnoDB tables, the above choice means that this index will be the clustered index of the table and any range (on the date) queries on the table will use it and will only scan the relevant part of the table.
If the HasExpired
condition is more complex, e.g. some rows should be marked expired after a month, some others after a year, some after a few days, I would consider splitting the table(s) into 2 parts. One having only expired data and one having all data.
So, when some data expires, you'll simply delete them from the "not_expired" table. The data will already be on the other. When you search for not expired data, you'll be searching only one table (the smallest one).
You also have the benefit of adding different indexes on the tables (for example only the above suggestion for the CI in the "all-data" table and many indexes on the not-expired table, depending on your queries.) Without the split, you would need to have an HasExpired
column and attach it to all your indexes (as first column), to have equally good indexes.
If the (not-expired) data is updated many times, it might be beneficial to have the split as "not-expired" and "expired" data. So you won't need to do any updates on the "expired" table. But when some data expires, you'll simply move them (instead of deleting) from the "not_expired" table to the "expired" one.
Best Answer
Please note the difference between the
information_schema
andperformance_schema
databasesINFORMATION_SCHEMA
The information_schema database is an inventory of all objects within the MySQL instance
Such objects include:
I wrote a nice post about this 3 years ago : How is INFORMATION_SCHEMA implemented in MySQL?
PERFORMANCE_SCHEMA
The performance_schema database is an instrumentation platform that collects statistics about phases of a running query. Such info can give you hints to tune the query or the environment. You only need to enable it when debugging and troubleshooting queries in conjunction with adding indexes (may or may not be necessary) and adjusting buffer settings (again, may or may not be necessary).
I have not written much about
performance_schema
since I do not use it that often (Mentioned in my old post Is there any way to get the estimate cost of executing a query in MySQL?).Derek Downey also wrote a quick post on it : What is the usage of performance_schema in MySQL 5.5?
There is a wealth of information on how to use the
performance_schema
I think you will enjoy