Mysql – sql indexing and performance

MySQLoptimization

I am using mysql5.1, i have table which has about 15 lakh (1.5 million) records.This table has records for different entities i.e child records for all master entities.

There are 8 columns in this table , out of which 6 columns are clubbed to make a primary key. These columns could be individual foreign keys but due to performance we have made this change.

Below is the show create table output

 CREATE TABLE `hybrid_exp_trait_dtl` (
  `HYBRID_NUMBER` varchar(100) NOT NULL,
  `TRIAL_STATUS` char(4) NOT NULL,
  `LOCATION_CODE` char(5) NOT NULL,
  `EXPERIMENT_ID` char(20) NOT NULL,
  `REPLICATION_NUMBER` int(3) unsigned NOT NULL,
  `TRAIT_NAME` varchar(30) NOT NULL,
  `TRAIT_VALUE` varchar(50) DEFAULT NULL,
  `MACHINE_NO` char(2) NOT NULL,
  `USER_ID` char(8) NOT NULL,
  `MYTIME` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  KEY `FK_HYBRID_EXP_TRAIT_DTL_2` (`LOCATION_CODE`),
  KEY `FK_hybrid_exp_trait_dtl_3` (`USER_ID`),
  KEY `Index_4` (`HYBRID_NUMBER`, `EXPERIMENT_ID`, `TRIAL_STATUS`, `LOCATION_CODE`, `REPLICATION_NUMBER`, `TRAIT_NAME`, `USER_ID`) USING BTREE,
  CONSTRAINT `FK_HYBRID_EXP_TRAIT_DTL_2` FOREIGN KEY (`LOCATION_CODE`) REFERENCES `location_mst` (`LOCATION_CODE`),
  CONSTRAINT `FK_hybrid_exp_trait_dtl_3` FOREIGN KEY (`USER_ID`) REFERENCES `user_data` (`USER_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

As you can see I have four indexes

  1. KEY FK_HYBRID_EXP_TRAIT_DTL_2 (LOCATION_CODE)
  2. KEY FK_hybrid_exp_trait_dtl_3 (USER_ID)
  3. KEY Index_4 (HYBRID_NUMBER, EXPERIMENT_ID, TRIAL_STATUS, LOCATION_CODE, REPLICATION_NUMBER, TRAIT_NAME, USER_ID) USING BTREE

Below are the two queries which take long time 4.26 and 0.64 respectively.

select  distinct 
        location_code, 
        Max(replication_number) as replication
from hybrid_exp_trait_dtl  t
where t.experiment_id='NSKOK12K1102' and t.trial_status='MLT';

And this one

SELECT * 
FROM  hybrid_exp_trait_dtl h  
where h.experiment_id='NSKOK12K1102' 
  and h.TRAIT_NAME='YLD' ;

In order to improve performance i modified 3rd index and added one more index , so below are the indexes for my tables

  1. KEY FK_HYBRID_EXP_TRAIT_DTL_2 (LOCATION_CODE)
  2. KEY FK_hybrid_exp_trait_dtl_3 (USER_ID)
  3. KEY Index_4 (EXPERIMENT_ID,TRIAL_STATUS,LOCATION_CODE,REPLICATION_NUMBER)
  4. KEY hbrid (HYBRID_NUMBER,EXPERIMENT_ID,TRAIT_NAME)

Now performance has increased significantly for both the queries and now performance time is .45 and .3 sec respectively.

Is this indexing correct , will it have effect on my update and insert statement as i will be doing bulk update.

Best Answer

I took the liberty of editing your SQL so the clauses are on separate lines. I violated CodeReview group discipline by doing that. Sorry! My suggestion to you is to format your queries so the various clauses are clearly visible, as I did. This kind of formatting, in my experience, makes it easier to understand the logic of queries and to spot errors.

You've done a good job declaring your columns NOT NULL where possible. That helps performance a lot.

Your first query seems to be illogical, as it combines an aggregate function with a DISTINCT qualifier. MySQL allows a lot of sloppiness in aggregate queries unfortunately.

It looks like you're looking for a resultset with a row for each location that shows the the largest replication_number at that location. Do you want this query instead?

select  location_code, 
        Max(replication_number) as replication
   from hybrid_exp_trait_dtl  t
  where t.experiment_id='NSKOK12K1102' and t.trial_status='MLT'
  group by location_code

Your Index_4 will allow this query to be satisfied using a so-called loose index scan, which is really very fast.

Your second query will most likely be a lot faster if you change your fourth compound key (the one you've named hbrid) to have the following order.

(EXPERIMENT_ID,TRAIT_NAME,HYBRID_NUMBER)

That is because your query looks for exact values of EXPERIMENT_ID and TRAIT_NAME. When those two columns come first in the index, the query can jump to the exact position in the index and then sequentially reel off your results.

Now, you may have some other query that needs to have HYBRID_NUMBER first, but you didn't mention it in your question.

You asked what effect these indexes will have on load performance. It's hard to answer that question exactly without knowing a lot about your load process.

You don't have any indexes that call for uniqueness. That is very good for loading performance.

But, there are a few things you can do to speed up loading. They are detailed here. The most important speed trick is probably to avoid autocommit. Start your import process with this SQL command.

 SET autocommit=0;

Then do a few hundred INSERT or UPDATE statements in a row, then do

 COMMIT;

Don't try to do too many INSERT or UPDATE statements before COMMIT or you'll use up a lot of RAM with transaction buffers.

You can also turn of foreign key checks while loading. See the web page mentioned above. But be careful. If your loaded data violates foreign key constraints, when you try to turn checking back on you'll have a mess.

When you're done with a load operation that changes a large fraction of the rows, do

 OPTIMIZE LOCAL TABLE hybrid_exp_trait_dtl;

Your table will be unavailable while this OPTIMIZE operation runs: it rebuilds the table and indexes.