MySQL – How Group By Affects Aggregate Functions

aggregategroup byMySQL

I tried to understand the following scenarios:

Table name: family
+------+-------+
| id   | name  |
+------+-------+
|    1 | alice |
|    2 |  bob  |
+------+-------+

That table will be used for illustration purposes.

Server version: 5.6.20 MySQL Community Server (GPL)

Lets try:

SELECT id, name, COUNT(name) FROM family WHERE id = 50;

The query will result in:

+------+------+-------------+
| id   | name | COUNT(name) |
+------+------+-------------+
| NULL | NULL |           0 |
+------+------+-------------+

My first question is about how I got this result?, what I think is that:

The COUNT(name) gets called after the query(internally) resolved into:

+------+------+
| id   | name |
+------+------+
| NULL | NULL |
+------+------+

Now at this point the COUNT(name) is ignoring the NULL value and that is why I get this extra column:

+-------------+
| COUNT(name) |
+-------------+
|           0 |
+-------------+

I only want to understand the steps that build the result.

The following step is taken by me so I can get Empty set and not the table above.

So to get Empty set as the result I add:

SELECT id, name, COUNT(name) FROM family WHERE id=50 GROUP BY id;

At this point I pretty desperate, not just I couldn't understand the problem I'm lost with how the solution is even working.

My second question is about the solution, again my focus is on the process, the actual steps the database has taken to achive that result.

If anyone can walk me step by step I will be very thankful.

Best Answer

There are several aspects of your query. The first is that mysql by default will give you a random answer if there are several possible answers to a group by question. SQL92 demanded that all non aggregated columns in the select clause must be part of the group by clause. SQL99 loosened this restriction and requires that all non aggregated columns are functionally dependent of the group by clause. By default mysql ignores both of these and returns a random row if there are several possible ones. You can force the SQL92 behaviour by adding ONLY_FULL_GROUP_BY to @@sql_mode, most DBMS's use such implementation. A small example:

create table t (x int not null, y int not null);
insert into t (x,y) values (1,1),(1,2);

For a query like:

select x,y, count(*) from t group by x;

There are two possible answers:

1 | 1 |        2
1 | 2 |        2

Mysql will randomly return one of them. In some sense the result is undefined.

To see what happens when there is no group by clause at all involved, let's add another row to the equation:

insert into t (x,y) values (2,3);

select x,y, count(*) from t

Since we have no group by count(*) will be applied to the whole result, for x and y there are 3 possible combinations:

1 |    1 |        3
1 |    2 |        3
2 |    3 |        3

Once again mysql will shoot you in the foot by randomly returning one of these rows, say:

1 |    1 |        3

For an empty set the aggregate function count(*) should be 0 (WHERE is evaluated before GROUP BY), but there are no x and y:

select x,y, count(*) from t where x=4;
+------+------+----------+
| x    | y    | count(*) |
+------+------+----------+
| NULL | NULL |        0 |
+------+------+----------+

so they are shown as non-existing (NULL).

The short answer is that you get a non-sense answer because you asked a non-sense question ;-)

The scenario you describe is is probably the one thing that causes the most confusion in the mysql community. I wish that ONLY_FULL_GROUP_BY where the default behaviour and that there where a setting ALLOW_PARTIAL_GROUPING that could be used for backward compability.