I have a table in MySQL called messages
, which looks like this :
id (primary key) | description | created_at(timestamp)
This table is supposed to hold chat messages between users of my application. As a result, the number of write operations on it will be high.
The application has an API that returns all the messages between two time stamps. This query will be fairly common too, but less than the number of write operations.
I ran mysqlslap with 100 concurrent connections and around 15000 rows in that table, and got a total time of around 8.6 seconds.
Then I added a secondary index on created_at
, hoping to get the results in much less time for my search between two times, but I got an increase of 0.3 seconds for the same inputs.
Why did I not see a significant performance increase?
Edit:
this is what my table looks likes :
DROP TABLE IF EXISTS `mssg`;
CREATE TABLE `mssg` (
`id` INTEGER NULL AUTO_INCREMENT DEFAULT NULL,
`body` MEDIUMTEXT NULL DEFAULT NULL,
`length` VARCHAR NULL DEFAULT NULL,
`created_at` TIMESTAMP NULL DEFAULT NULL,
PRIMARY KEY (`id`)
);
and this is how i added the index:
ALTER TABLE testalter_tbl ADD INDEX (created_at);
Typically, there will be around 150 to 350 messages returned from the select statements
Best Answer
I'll skip my comments about that
CREATE TABLE
statement that would be classified from wrong to insulting, like having all columns declaredNULL
(what would a row with(NULL, NULL, NULL, NULL)
mean?) and concentrate on the question asked.The difference between 8.6 and 8.9 seconds is negligible. For all we know, the running times of the 2 tests are almost identical. If you want to test properly, you should run longer tests and with larger size of the table (try with 15K, 150K, 1500K, 15M rows) and see whether the efficiency changes or stays the same between having or not the specific index.
You should also examine the execution plans created for the queries in the 2 cases (at various table sizes), to see what is happening behind the scenes, what indexes are used, if any, etc.
My prediction would be that as the table grows larger, the index will get more useful, as the queries will require an index seek to find the (IDs of the) rows in the wanted time range and then some additional seeks from the clustering PK index (I assume you use InnoDB.)
Without the index, a whole table scan will have to be done each time. The larger the table gets, the biggest the difference you'll see between the 2 test cases (with and without the index).
If the vast majority of your queries is similar to the following, getting rows from a small time range:
then I would suggest (assuming you use InnoDB) an alternative design: make the
(created_at)
the clustering key of the table. Since you may have 2 or more rows with exact same timestamp, you can set the primary key to(create_at, id)
for this effect (as InnoDB allows only unique keys for clustering). You will also need an additional index on(id)
for theAUTO_INCREMENT
to be allowed. This can be declaredUNIQUE
or not, it's your choice:This way, all the rows that your queries need will be stored in consecutive pages of the clustering (PK) index and the efficiency of the queries - as long as they ask for a small time range - will not be affected by the size of the table.
Further comments, partially related to the question;
NULL
? I suggest you change them all toNOT NULL
except for those columns that there is a specific reason to allow nulls.mediumtext
for the message? Are you going to store so large messages? It might be better for performance, if you could reduce the size, to sayVARCHAR(250)
.length
declared asVARCHAR
and notINTEGER
?TIMESTAMP
column to get a defaultCURRENT_TIMESTAMP
if you want (if it is not already provided by the application.)