MySQL Index Optimization – Why Adding a Secondary Index Didn’t Improve Select Performance

indexMySQLoptimization

I have a table in MySQL called messages, which looks like this :

id (primary key)   | description    |  created_at(timestamp)

This table is supposed to hold chat messages between users of my application. As a result, the number of write operations on it will be high.

The application has an API that returns all the messages between two time stamps. This query will be fairly common too, but less than the number of write operations.

I ran mysqlslap with 100 concurrent connections and around 15000 rows in that table, and got a total time of around 8.6 seconds.

Then I added a secondary index on created_at, hoping to get the results in much less time for my search between two times, but I got an increase of 0.3 seconds for the same inputs.

Why did I not see a significant performance increase?

Edit:

this is what my table looks likes :

DROP TABLE IF EXISTS `mssg`;

CREATE TABLE `mssg` (
  `id` INTEGER NULL AUTO_INCREMENT DEFAULT NULL,
  `body` MEDIUMTEXT NULL DEFAULT NULL,
  `length` VARCHAR NULL DEFAULT NULL,
  `created_at` TIMESTAMP NULL DEFAULT NULL,
  PRIMARY KEY (`id`)
);

and this is how i added the index:

ALTER TABLE testalter_tbl ADD INDEX (created_at);

Typically, there will be around 150 to 350 messages returned from the select statements

Best Answer

I'll skip my comments about that CREATE TABLE statement that would be classified from wrong to insulting, like having all columns declared NULL (what would a row with (NULL, NULL, NULL, NULL) mean?) and concentrate on the question asked.

The difference between 8.6 and 8.9 seconds is negligible. For all we know, the running times of the 2 tests are almost identical. If you want to test properly, you should run longer tests and with larger size of the table (try with 15K, 150K, 1500K, 15M rows) and see whether the efficiency changes or stays the same between having or not the specific index.

You should also examine the execution plans created for the queries in the 2 cases (at various table sizes), to see what is happening behind the scenes, what indexes are used, if any, etc.

My prediction would be that as the table grows larger, the index will get more useful, as the queries will require an index seek to find the (IDs of the) rows in the wanted time range and then some additional seeks from the clustering PK index (I assume you use InnoDB.)

Without the index, a whole table scan will have to be done each time. The larger the table gets, the biggest the difference you'll see between the 2 test cases (with and without the index).

If the vast majority of your queries is similar to the following, getting rows from a small time range:

SELECT <columns_wanted>
FROM mssg
WHERE created_at >= @start_timestamp
  AND created_at < @end_timestamp ;

then I would suggest (assuming you use InnoDB) an alternative design: make the (created_at) the clustering key of the table. Since you may have 2 or more rows with exact same timestamp, you can set the primary key to (create_at, id) for this effect (as InnoDB allows only unique keys for clustering). You will also need an additional index on (id) for the AUTO_INCREMENT to be allowed. This can be declared UNIQUE or not, it's your choice:

CREATE TABLE mssg (
  mssg_id INTEGER NOT NULL AUTO_INCREMENT,
  body MEDIUMTEXT NULL,
  length VARCHAR(20) NULL,
  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (created_at, mssg_id),
  -- UNIQUE
    INDEX mssg_id_ix (mssg_id)             -- this is needed for the AutoIncrement
);

This way, all the rows that your queries need will be stored in consecutive pages of the clustering (PK) index and the efficiency of the queries - as long as they ask for a small time range - will not be affected by the size of the table.

Further comments, partially related to the question;

Why are all columns declared as NULL? I suggest you change them all to NOT NULL except for those columns that there is a specific reason to allow nulls.
Why mediumtext for the message? Are you going to store so large messages? It might be better for performance, if you could reduce the size, to say VARCHAR(250).
Why is the length declared as VARCHAR and not INTEGER?
Further, if it's going to be storing the length of the text, it is not needed at all. You can get the length at any time using the proper text function.
You could declare the TIMESTAMP column to get a default CURRENT_TIMESTAMP if you want (if it is not already provided by the application.)

Related Solutions

Innodb – Should I normalise this database

The first solution is clearly more efficient than the second one, because you don't need to do the join (every join slows down a query, more or less depending on the access plan generated by the system).

More, I think the second solution has some problem, since from the schema it seems that you are using the column id inside mssg_grp as foreign key referencing the table mssg, while in that table you use as primary key the couple of columns created_at and id, which conflicts with the definition of the foreign key, from which one could assume that id is already a primary key for mssg.

So, if, as it seems from the diagram, the relation between mssg and mssg_grp is 1:1, you should opt for the first solution which is still normalized and more efficient.

Mysql – Need help improving sql query performance

ORDER BY in the sub query is what's making it slow.

ORDER BY makes query slow since you need to order (sort) all the objects in the query results. It is in order of N Square if you don't have an index, and N log N if you do.

And you don't even need to order by all of it.

What you need is only to filter out the minimum, so instead of:

ORDER BY  
  DATE(`t2`.`created_at`) DESC LIMIT 1

I would add another condition in the WHERE clause saying:

WHERE DATE(t2.'created at') <> (SELECT MAX (DATE ('created at') FROM 't2'))

Since it needs to look at all the dates in the query, but not necessarily to order them.

Google "Create Composite Index". MySQL is not my variant of SQL. It is always good practice to check subjects like indexes and Primary Keys outside of Q&A forums (like this one) since they are very broad and you need to read at least 5 articles in order to decide which composite index you need, on top of how to write it (but since you query that field quite often in that query, I'd suggest you should create one).

Best Answer

Related Solutions

Innodb – Should I normalise this database

Mysql – Need help improving sql query performance

Related Question