VIEWPOINT #1 : You many need to take a look at column value population
SELECT COUNT(1) rowount,type FROM games GROUP BY type WITH ROLLUP;
SELECT COUNT(1) rowcount,leaguelevel FROM games GROUP BY leaguelevel WITH ROLLUP;
From your question, I gather two things:
- The number of rows in games with type='R' must be a low number against the number of rows in the games table.
- The number of rows in games with leaguelevel='mlb' must be a high number (greater than 5% of the table) against the number of rows in the games table. (5% is a rule-of-thumb number in the eyes of Query Optimizers)
VIEWPOINT #2 : You may need to refactor this query
Notice that query will perform the WHERE portion after all JOINs are complete. If the WHERE portion could be performed earlier that could help reduce the time. Try reorganizing the query like this:
SELECT * FROM hits
JOIN stadiums ON stadiums.gameName = hits.gameName
JOIN players ON (players.gameName = hits.gameName AND players.id = hits.batter)
JOIN (SELECT * FROM games WHERE leagueLevel = 'mlb') games
ON games.gameName = hits.gameName
LIMIT 50;
VIEWPOINT #3 : Retrieve only the columns you really need
I see you have SELECT * and you have four tables (hits, stadiums, players, games). You will have a lot of duplicate data to drag into the query, particularly when dragging the gameName column from all four tables.
You should reorganize the query to bring only one gameName column:
SELECT hits.gameName,hits.*,players.*,staduims.*,games.* FROM hits
JOIN stadiums ON stadiums.gameName = hits.gameName
JOIN players ON (players.gameName = hits.gameName AND players.id = hits.batter)
JOIN (SELECT * FROM games WHERE leagueLevel = 'mlb') games
ON games.gameName = hits.gameName
LIMIT 50;
Additionally, if you do not need every column from the hits tables, then only include the column you know you will access. The same goes for players, stadiums, and games.
In other words, as an example, if you only need the playerName from the player table, then you do not need player.* in the SELECT. You will need just player.playerName.
VIEWPOINT #4 : You may need to index the leagueLevel column
You will need to do the following to make the needed index:
ALTER TABLE games ADD INDEX (leagueLevel);
Before doing so, run this
SELECT COUNT(1) rowcount,leaguelevel FROM games GROUP BY leaguelevel WITH ROLLUP;
Any value for leagueLevel whose count is greater than 5% of the table will cause the MySQL Query Optimizer not to use the index.
UPDATE
Now I see your error. The inner query uses aggregation, and takes out sizeRange
which is not a column you aggregate on. So you only get "samples" of that column. Strictly speaking, your query is not valid SQL but MySQL allows it given a relaxed sql_mode
.
So your query is inherently erroneous. Will see if I can help fix it.
ORIGINAL answer
My guess for you would be to check the value of group_concat_max_len
.
It is by default just 1024
, though you typically don't really want a limit for that.
The problem might be that you are only getting partial results, where, by chance or by order of evaluation, "S" and "XL" occupy first 1024 characters or more. I see no reason why "M" or "L" would not be there -- the GROUP_CONCAT
doesn't do such distinctions.
So, try out:
SET group_concat_max_len := 1000000;
And execute your query again. If this works, make sure to set said param in you MySQL configuration file.
You may find my related post useful.
Best Answer
Your problem is that you are inner joining both subqueries together, meaning that if there are no rows for the join condition (in your case, p.sender_mark = b.sender_mark) in one of the subqueries, you won't get a row in the results for the row in the other subquery that does exist.
You could get around this by turning the join into an outer join, but (IMO) the better option is to use conditional aggregation, like so:
Aggregate functions (including
COUNT
, except for when you doCOUNT(*)
) nearly all exclude null values from their calculations. Therefore by outputting NULL values if a condition is not met, you can getCOUNT
to ignore those rows.Doing it like that means you are now only querying the table once instead of twice, which is nearly always going to be more performant.