MySQL – Resolving Incorrect Output from MAX() of SUM()

aggregateMySQL

I have a table employee

id    name   salary  city
1     ram    50000   c1
2     sham   20000   c2
3     jadu   80000   c1
4     madhu  90000   c4
5     hari   10000   c2
6     gopal  34000   c3
7     komal  55000   c3
8     bappa  98000   c4

query is which city earning the highest.
i tried

SELECT city, SUM(salary) AS maxSalary 
FROM employee GROUP BY city ORDER BY salary DESC LIMIT 1;

it works fine but if there are more than one max earning cities then it doesn't output other max cities, only the first one.

so i tried this query

SELECT city, MAX(totalSalary) maxSalary 
FROM( SELECT city, SUM(salary) AS totalSalary FROM employee GROUP BY city  ) AS tempTable

it is giving

city  max
c1    188000

but correct is

city  max
c4    188000

It means it is outputting the first most city name of the table which is c1 but not the correct max earning city name which is c4.
What is the right query?

Best Answer

You sent the wrong query to the database. You hit a mysql extension that is described in the manual. A query like

SELECT city, MAX(salary) 
from employee

will not work in standard SQL. It raises an error in Oracle (error message: "ORA-00937: not a single-group group function") , in MSSqlServer 2012 (error message: "Column 'employee.city' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause") or postgresql (error message:"ERROR: column "employee.city" must appear in the GROUP BY clause or be used in an aggregate function").

In standard SQL expressions in the select list can only be expressions of columns that are also used in the group by clause and of aggregate functions. A column values that is used in the group by clause is the same for all rows in these group. The aggregate is also unique for a group. The select returns one row for each group containing this uniquely defined values for a group.

If the aggregate query does not have a group by clause ther is only one group of rows.

The standard does not allow to use an arbitrary column in the select list of an aggregate query because its value not uniquely defined: the column has different values for the rows of the query so which one should be returned for this group by the query?

In mysql there is the extension that if there is a column in an expression (not an in an aggregate function expression) that is not in the group by clause the value returned by the select statement for this group is the value of this column of an arbitrary row of this group.

So the query

SELECT city, MAX(salary) 
from employee

returns the sumof all employees' salaries and the city of one of these rows. But the query return exactly one reow becuase it is an aggregate query and there is only one group that contains all records from employee table.

The query

SELECT city, MAX(salary) 
from employee
group by city

returns a row for each city with the city and of the salaries of each city.

The query

SELECT city, salary
from employee
group by city

returns arow for each city with and salaty of an arbitrary employee of this city. The server decides which employee of a city is choosen.

The query

SELECT city, zipcode
from employee
group by city

also return a row for each city with the zipcode of this city. If we assume that every city haas only one zipcode than the rows of each group defined by a city contain al the same zipcode. so independent of the row the server selects the retuned zicode wil always be the zipcode of the city.

In standard SQL this query would be written as

SELECT city, zipcode
from employee
group by city, zipcode 

to create the expected result

group by city,zipcode and group city define the same groups because of the one to one correspondence between zipcdoe an city.


The query

SELECT city, SUM(salary) 
from employee
group by city

will contain not only the needed rows but more.

A way to filter out the needed rows from this query is the following

SELECT city, SUM(salary) 
from employee
group by city 
having SUM(salary) >= all (select SUM(salary)  city_salary 
     from employee
     group by city)

There are other ways to filter.

The largest salary sum can be found by the following query

select MAX(city_salary)  
   from (select SUM(salary)  city_salary 
     from employee
     group by city) tab

(Note: In Oracle the same can be achieved by

select MAX(SUM(salary))  city_salary 
  from employee
  group by city

)

So we can use it to filter out the needed one:

SELECT city, SUM(salary) 
from employee
group by city 
having SUM(salary) = (select MAX(city_salary)  
   from (select SUM(salary)  city_salary 
     from employee
     group by city) tab) 

This solution was already presented by @Mihai in his comment.