Mysql – Optimizing a query that returns several avg() values

MySQLoptimizationperformancequery-performance

I need help optimizing these two queries:

select avg(h_ab),avg(h_h),avg(h_2b),
       avg(h_3b),avg(h_hr),avg(h_rbi),
       avg(h_sh),avg(h_sf),avg(h_hbp),
       avg(h_bb),avg(h_ibb),avg(h_k),
       avg(h_sb),avg(h_cs),avg(h_gdp),
       avg(h_ci),avg(h_lob),avg(h_num_pitchers),
       avg(h_ier),avg(h_er),avg(h_wp),
       avg(h_bk),avg(h_po),avg(h_a),
       avg(h_e),avg(h_pb),avg(h_dp),avg(h_tp) 
from gamelogs 
where(h_1_id = var1 
      or h_2_id= var1 
      or h_3_id= var1 
      or h_4_id= var1 
      or h_5_id= var1 
      or h_6_id= var1 
      or h_7_id= var1 
      or h_8_id= var1 
      or h_9_id= var1 
      or h_start_pitcher_id= var1) 
and date<var2;

select avg(a_ab),avg(a_h),avg(a_2b),
       avg(a_3b),avg(a_hr),avg(a_rbi),
       avg(a_sh),avg(a_sf),avg(a_hbp),
       avg(a_bb),avg(a_ibb),avg(a_k),
       avg(a_sb),avg(a_cs),avg(a_gdp),
       avg(a_ci),avg(a_lob),avg(a_num_pitchers),
       avg(a_ier),avg(a_er),avg(a_wp),
       avg(a_bk),avg(a_po),avg(a_a),
       avg(a_e),avg(a_pb),avg(a_dp),avg(a_tp) 
from gamelogs 
where(a_1_id = var1 
      or a_2_id= var1  
      or a_3_id= var1  
      or a_4_id= var1  
      or a_5_id= var1  
      or a_6_id= var1  
      or a_7_id= var1  
      or a_8_id= var1  
      or a_9_id= var1  
      or a_start_pitcher_id= var1) 
and date<var2;

I am using matlab to execute the queries, and 20 of each need to be done for each row in the table (to get the averages for 20 players). This takes a very long time. I need whatever help making it faster.

The queries are identical, except with different where conditions (I could not find a way around this). I cannot add an index, as it would need to be greater than 16 parts (according to my understanding, there's no point in doing this in several parts).

Each currently about .4 seconds to run, and there is basically no optimization on the table. It has 150,000 rows, and I have versions in both InnoDB and MyIsam. The latter is marginally faster. The structure of the table is below.

My only idea left is to switch the player ids, (see h_n_id and a_n_id) to integers, though this is a somewhat complicated problem, I can do it if it will significantly improve results.

create table gamelogs (
game_id int(11) unsigned not null auto_increment,
date int(11),
game_num int(11),
day VARCHAR(100),
away VARCHAR(100),
a_league VARCHAR(100),
a_game_num int(11),
home VARCHAR(100),
h_league VARCHAR(100),
h_game_num int(11),
a_score int(11),
h_score int(11),
len_outs int(11),
dn VARCHAR(100),
Completion VARCHAR(100),
Forfeit VARCHAR(100),
Protest VARCHAR(100),
park_id VARCHAR(100),
attendance int(11),
len_min int(11),
a_line VARCHAR(100),
h_line VARCHAR(100),
h_ab int(11),
h_h int(11),
h_2b int(11),
h_3b int(11),
h_hr int(11),
h_rbi int(11),
h_sh int(11),
h_sf int(11),
h_hbp int(11),
h_bb int(11),
h_ibb int(11),
h_k int(11),
h_sb int(11),
h_cs int(11),
h_gdp int(11),
h_ci int(11),
h_lob int(11),
h_num_pitchers int(11),
h_ier int(11),
h_er int(11),
h_wp int(11),
h_bk int(11),
h_po int(11),
h_a int(11),
h_e int(11),
h_pb int(11),
h_dp int(11),
h_tp int(11),
a_ab int(11),
a_h int(11),
a_2b int(11),
a_3b int(11),
a_hr int(11),
a_rbi int(11),
a_sh int(11),
a_sf int(11),
a_hbp int(11),
a_bb int(11),
a_ibb int(11),
a_k int(11),
a_sb int(11),
a_cs int(11),
a_gdp int(11),
a_ci int(11),
a_lob int(11),
a_num_pitchers int(11),
a_ier int(11),
a_er int(11),
a_wp int(11),
a_bk int(11),
a_po int(11),
a_a int(11),
a_e int(11),
a_pb int(11),
a_dp int(11),
a_tp int(11),
h_ump_id VARCHAR(100),
h_ump VARCHAR(100),
1b_ump_id VARCHAR(100),
1b_ump VARCHAR(100),
2b_ump_id VARCHAR(100),
2b_ump VARCHAR(100),
3b_ump_id VARCHAR(100),
3b_ump VARCHAR(100),
lf_ump_id VARCHAR(100),
lf_ump VARCHAR(100),
rf_ump_id VARCHAR(100),
rf_ump VARCHAR(100),
a_manager_id VARCHAR(100),
a_manager VARCHAR(100),
h_manager_id VARCHAR(100),
h_manager VARCHAR(100),
w_pitcher_id VARCHAR(100),
w_pitcher VARCHAR(100),
l_pitcher_id VARCHAR(100),
l_pitcher VARCHAR(100),
s_pitcher_id VARCHAR(100),
s_pitcher VARCHAR(100),
w_batter_id VARCHAR(100),
w_batter VARCHAR(100),
h_start_pitcher_id VARCHAR(100),
h_start_pitcher VARCHAR(100),
a_start_pitcher_id VARCHAR(100),
a_start_pitcher VARCHAR(100),
h_1_id VARCHAR(100),
h_1 VARCHAR(100),
h_1_pos int(11),
h_2_id VARCHAR(100),
h_2 VARCHAR(100),
h_2_pos int(11),
h_3_id VARCHAR(100),
h_3 VARCHAR(100),
h_3_pos int(11),
h_4_id VARCHAR(100),
h_4 VARCHAR(100),
h_4_pos int(11),
h_5_id VARCHAR(100),
h_5 VARCHAR(100),
h_5_pos int(11),
h_6_id VARCHAR(100),
h_6 VARCHAR(100),
h_6_pos int(11),
h_7_id VARCHAR(100),
h_7 VARCHAR(100),
h_7_pos int(11),
h_8_id VARCHAR(100),
h_8 VARCHAR(100),
h_8_pos int(11),
h_9_id VARCHAR(100),
h_9 VARCHAR(100),
h_9_pos int(11),
a_1_id VARCHAR(100),
a_1 VARCHAR(100),
a_1_pos int(11),
a_2_id VARCHAR(100),
a_2 VARCHAR(100),
a_2_pos int(11),
a_3_id VARCHAR(100),
a_3 VARCHAR(100),
a_3_pos int(11),
a_4_id VARCHAR(100),
a_4 VARCHAR(100),
a_4_pos int(11),
a_5_id VARCHAR(100),
a_5 VARCHAR(100),
a_5_pos int(11),
a_6_id VARCHAR(100),
a_6 VARCHAR(100),
a_6_pos int(11),
a_7_id VARCHAR(100),
a_7 VARCHAR(100),
a_7_pos int(11),
a_8_id VARCHAR(100),
a_8 VARCHAR(100),
a_8_pos int(11),
a_9_id VARCHAR(100),
a_9 VARCHAR(100),
a_9_pos int(11),
additional_info VARCHAR(500),
acquistional_info VARCHAR(100),
primary key(game_id));

UPDATE #1

Ok, I created a third table of the same structure, but the id values are all bigints (because it was easy to switch to bigint using Python's hash() function, with low chance of collision).

I've also limited the date search to only 10 years. This means each query is only looking at 10% of the table at once.

Now the queries look like this:

select avg(h_ab),avg(h_h),avg(h_2b),avg(h_3b),avg(h_hr),avg(h_rbi),avg(h_sh),avg(h_sf),avg(h_hbp),avg(h_bb),avg(h_ibb),avg(h_k),avg(h_sb),avg(h_cs),avg(h_gdp),avg(h_ci),avg(h_lob),avg(h_num_pitchers),avg(h_ier),avg(h_er),avg(h_wp),avg(h_bk),avg(h_po),avg(h_a),avg(h_e),avg(h_pb),avg(h_dp),avg(h_tp) from gamelogs where(h_1_id = var1 or h_2_id= var1 or h_3_id= var1 or h_4_id= var1 or h_5_id= var1 or h_6_id= var1 or h_7_id= var1 or h_8_id= var1 or h_9_id= var1 or h_start_pitcher_id= var1) and date between var2 and var3
select avg(a_ab),avg(a_h),avg(a_2b),avg(a_3b),avg(a_hr),avg(a_rbi),avg(a_sh),avg(a_sf),avg(a_hbp),avg(a_bb),avg(a_ibb),avg(a_k),avg(a_sb),avg(a_cs),avg(a_gdp),avg(a_ci),avg(a_lob),avg(a_num_pitchers),avg(a_ier),avg(a_er),avg(a_wp),avg(a_bk),avg(a_po),avg(a_a),avg(a_e),avg(a_pb),avg(a_dp),avg(a_tp) from gamelogs where(a_1_id = var1 or a_2_id= var1 or a_3_id= var1 or a_4_id= var1 or a_5_id= var1 or a_6_id= var1 or a_7_id= var1 or a_8_id= var1 or a_9_id= var1 or a_start_pitcher_id= var1) and date between var2 and var3

This has improved things very slightly. I see a difference of about .05 seconds with these improvements. This is not really a significant enough improvement though.

UPDATE #2

Ok, I've updated the table structure, stripping out everything that wasn't used in the queries. Effectively, I took the table from broad to a little less broad, and made sure all variable types were the smallest they could be.

This had the most significant impact thus far, cutting query time in half, to around .27 seconds a query.

While much better, this is still longer than I really need. If anyone has any other ideas, that would be really helpful.

The updated table is below.

+--------------------+------------------+------+-----+---------+----------------+
| Field              | Type             | Null | Key | Default | Extra          |
+--------------------+------------------+------+-----+---------+----------------+
| game_id            | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| date               | int(11)          | YES  |     | NULL    |                |
| game_num           | int(11)          | YES  |     | NULL    |                |
| a_score            | tinyint(3)       | YES  |     | NULL    |                |
| h_score            | tinyint(3)       | YES  |     | NULL    |                |
| h_ab               | tinyint(3)       | YES  |     | NULL    |                |
| h_h                | tinyint(3)       | YES  |     | NULL    |                |
| h_2b               | tinyint(3)       | YES  |     | NULL    |                |
| h_3b               | tinyint(3)       | YES  |     | NULL    |                |
| h_hr               | tinyint(3)       | YES  |     | NULL    |                |
| h_rbi              | tinyint(3)       | YES  |     | NULL    |                |
| h_sh               | tinyint(3)       | YES  |     | NULL    |                |
| h_sf               | tinyint(3)       | YES  |     | NULL    |                |
| h_hbp              | tinyint(3)       | YES  |     | NULL    |                |
| h_bb               | tinyint(3)       | YES  |     | NULL    |                |
| h_ibb              | tinyint(3)       | YES  |     | NULL    |                |
| h_k                | tinyint(3)       | YES  |     | NULL    |                |
| h_sb               | tinyint(3)       | YES  |     | NULL    |                |
| h_cs               | tinyint(3)       | YES  |     | NULL    |                |
| h_gdp              | tinyint(3)       | YES  |     | NULL    |                |
| h_ci               | tinyint(3)       | YES  |     | NULL    |                |
| h_lob              | tinyint(3)       | YES  |     | NULL    |                |
| h_num_pitchers     | tinyint(3)       | YES  |     | NULL    |                |
| h_ier              | tinyint(3)       | YES  |     | NULL    |                |
| h_er               | tinyint(3)       | YES  |     | NULL    |                |
| h_wp               | tinyint(3)       | YES  |     | NULL    |                |
| h_bk               | tinyint(3)       | YES  |     | NULL    |                |
| h_po               | tinyint(3)       | YES  |     | NULL    |                |
| h_a                | tinyint(3)       | YES  |     | NULL    |                |
| h_e                | tinyint(3)       | YES  |     | NULL    |                |
| h_pb               | tinyint(3)       | YES  |     | NULL    |                |
| h_dp               | tinyint(3)       | YES  |     | NULL    |                |
| h_tp               | tinyint(3)       | YES  |     | NULL    |                |
| a_ab               | tinyint(3)       | YES  |     | NULL    |                |
| a_h                | tinyint(3)       | YES  |     | NULL    |                |
| a_2b               | tinyint(3)       | YES  |     | NULL    |                |
| a_3b               | tinyint(3)       | YES  |     | NULL    |                |
| a_hr               | tinyint(3)       | YES  |     | NULL    |                |
| a_rbi              | tinyint(3)       | YES  |     | NULL    |                |
| a_sh               | tinyint(3)       | YES  |     | NULL    |                |
| a_sf               | tinyint(3)       | YES  |     | NULL    |                |
| a_hbp              | tinyint(3)       | YES  |     | NULL    |                |
| a_bb               | tinyint(3)       | YES  |     | NULL    |                |
| a_ibb              | tinyint(3)       | YES  |     | NULL    |                |
| a_k                | tinyint(3)       | YES  |     | NULL    |                |
| a_sb               | tinyint(3)       | YES  |     | NULL    |                |
| a_cs               | tinyint(3)       | YES  |     | NULL    |                |
| a_gdp              | tinyint(3)       | YES  |     | NULL    |                |
| a_ci               | tinyint(3)       | YES  |     | NULL    |                |
| a_lob              | tinyint(3)       | YES  |     | NULL    |                |
| a_num_pitchers     | tinyint(3)       | YES  |     | NULL    |                |
| a_ier              | tinyint(3)       | YES  |     | NULL    |                |
| a_er               | tinyint(3)       | YES  |     | NULL    |                |
| a_wp               | tinyint(3)       | YES  |     | NULL    |                |
| a_bk               | tinyint(3)       | YES  |     | NULL    |                |
| a_po               | tinyint(3)       | YES  |     | NULL    |                |
| a_a                | tinyint(3)       | YES  |     | NULL    |                |
| a_e                | tinyint(3)       | YES  |     | NULL    |                |
| a_pb               | tinyint(3)       | YES  |     | NULL    |                |
| a_dp               | tinyint(3)       | YES  |     | NULL    |                |
| a_tp               | tinyint(3)       | YES  |     | NULL    |                |
| h_start_pitcher_id | bigint(20)       | YES  |     | NULL    |                |
| h_start_pitcher    | varchar(100)     | YES  |     | NULL    |                |
| a_start_pitcher_id | bigint(20)       | YES  |     | NULL    |                |
| a_start_pitcher    | varchar(100)     | YES  |     | NULL    |                |
| h_1_id             | bigint(20)       | YES  |     | NULL    |                |
| h_1                | varchar(100)     | YES  |     | NULL    |                |
| h_1_pos            | tinyint(3)       | YES  |     | NULL    |                |
| h_2_id             | bigint(20)       | YES  |     | NULL    |                |
| h_2                | varchar(100)     | YES  |     | NULL    |                |
| h_2_pos            | tinyint(3)       | YES  |     | NULL    |                |
| h_3_id             | bigint(20)       | YES  |     | NULL    |                |
| h_3                | varchar(100)     | YES  |     | NULL    |                |
| h_3_pos            | tinyint(3)       | YES  |     | NULL    |                |
| h_4_id             | bigint(20)       | YES  |     | NULL    |                |
| h_4                | varchar(100)     | YES  |     | NULL    |                |
| h_4_pos            | tinyint(3)       | YES  |     | NULL    |                |
| h_5_id             | bigint(20)       | YES  |     | NULL    |                |
| h_5                | varchar(100)     | YES  |     | NULL    |                |
| h_5_pos            | tinyint(3)       | YES  |     | NULL    |                |
| h_6_id             | bigint(20)       | YES  |     | NULL    |                |
| h_6                | varchar(100)     | YES  |     | NULL    |                |
| h_6_pos            | tinyint(3)       | YES  |     | NULL    |                |
| h_7_id             | bigint(20)       | YES  |     | NULL    |                |
| h_7                | varchar(100)     | YES  |     | NULL    |                |
| h_7_pos            | tinyint(3)       | YES  |     | NULL    |                |
| h_8_id             | bigint(20)       | YES  |     | NULL    |                |
| h_8                | varchar(100)     | YES  |     | NULL    |                |
| h_8_pos            | tinyint(3)       | YES  |     | NULL    |                |
| h_9_id             | bigint(20)       | YES  |     | NULL    |                |
| h_9                | varchar(100)     | YES  |     | NULL    |                |
| h_9_pos            | tinyint(3)       | YES  |     | NULL    |                |
| a_1_id             | bigint(20)       | YES  |     | NULL    |                |
| a_1                | varchar(100)     | YES  |     | NULL    |                |
| a_1_pos            | tinyint(3)       | YES  |     | NULL    |                |
| a_2_id             | bigint(20)       | YES  |     | NULL    |                |
| a_2                | varchar(100)     | YES  |     | NULL    |                |
| a_2_pos            | tinyint(3)       | YES  |     | NULL    |                |
| a_3_id             | bigint(20)       | YES  |     | NULL    |                |
| a_3                | varchar(100)     | YES  |     | NULL    |                |
| a_3_pos            | tinyint(3)       | YES  |     | NULL    |                |
| a_4_id             | bigint(20)       | YES  |     | NULL    |                |
| a_4                | varchar(100)     | YES  |     | NULL    |                |
| a_4_pos            | tinyint(3)       | YES  |     | NULL    |                |
| a_5_id             | bigint(20)       | YES  |     | NULL    |                |
| a_5                | varchar(100)     | YES  |     | NULL    |                |
| a_5_pos            | tinyint(3)       | YES  |     | NULL    |                |
| a_6_id             | bigint(20)       | YES  |     | NULL    |                |
| a_6                | varchar(100)     | YES  |     | NULL    |                |
| a_6_pos            | tinyint(3)       | YES  |     | NULL    |                |
| a_7_id             | bigint(20)       | YES  |     | NULL    |                |
| a_7                | varchar(100)     | YES  |     | NULL    |                |
| a_7_pos            | tinyint(3)       | YES  |     | NULL    |                |
| a_8_id             | bigint(20)       | YES  |     | NULL    |                |
| a_8                | varchar(100)     | YES  |     | NULL    |                |
| a_8_pos            | tinyint(3)       | YES  |     | NULL    |                |
| a_9_id             | bigint(20)       | YES  |     | NULL    |                |
| a_9                | varchar(100)     | YES  |     | NULL    |                |
| a_9_pos            | tinyint(3)       | YES  |     | NULL    |                |
+--------------------+------------------+------+-----+---------+----------------+

Best Answer

This is an alternative scheme to the ones already proposed, it's a bit like the summary table, except that it isn't aggregated, just 'partitioned' by player.

Create Table gamelogs_by_player (
    player_id bigint(20) Not Null,
    home_or_away char(1) Not Null, 
    `date` int(11) Not Null,
    game_id int(11) Not Null,
    ab tinyint(3),
    h tinyint(3),
    2b,
    3b,
    hr,
    rbi,
    ...,
    Constraint pk_gamelogs_by_player Primary Key Clustered (player_id, home_or_away, `date`);

Then, once for each player for home games (and likewise for away games):

Insert Into gamelogs_by_player (...)
Select var1 As player_id,
    'H' As home_or_away,
    `date`,
    game_id,
    h_ab,
    h_2b,
    ...
  From gamelogs
  Where h_1_id = var1
     Or h_2_id = var1
     Or ...

This table will have more rows than your original table, but each time you query the table should be faster because the query can go straight to the correct rows rather than reading the whole table and evaluating the filter each time.