Oracle – Unexpected Results from GROUP BY in Beginner SQL

aggregategroup byoracle

I have a sidewalk_deficiency table:

create table sidewalk_deficiency (
    def_id number(10), 
    insp_id number(10),
    def_length number(10,2));
insert into sidewalk_deficiency values (10,1,.5);
insert into sidewalk_deficiency values (11,1,1);
insert into sidewalk_deficiency values (12,1,1.5);
insert into sidewalk_deficiency values (13,2,2);
insert into sidewalk_deficiency values (14,2,2.5);
commit;

    DEF_ID    INSP_ID  DEF_LENGTH
---------- ----------  ----------
        10          1          .5
        11          1           1
        12          1         1.5
        13          2           2
        14          2         2.5

I can create a query that summarizes the sidewalk inspections by tallying up the deficiency lengths, per inspection:

select 
    insp_id,
    sum(def_length) as def_length 
from 
    sidewalk_deficiency 
group by 
    insp_id

   INSP_ID DEF_LENGTH
---------- ----------
         1          3
         2        4.5

This works as I had expected. Two rows are output; one for each insp_id.


I understand that the above query does, in fact, provide the desired output. However, I don't really understand why it works the way it does.

Admittedly, when I write queries like this, it makes more sense to me to include fields such as def_length in both the aggregate function and the group by clause. To me, I know I want to collapse the def_length into a sum, so it seems to me that I should group by it.

select 
    insp_id,
    sum(def_length) as def_length 
from 
    sidewalk_deficiency 
group by 
    insp_id,
    def_length       --<<--Problem

   INSP_ID DEF_LENGTH
---------- ----------
         2        2.5
         1         .5
         1          1
         2          2
         1        1.5

Of course, as we can see, my thought process is flawed. When I include the def_length field in the group by, it produces more records than I had expected.

Forgive my ignorance, but why does the second query produce more results than the first? I would have thought that including the field in both the aggregate function and the group by would have grouped the rows as I had planned.

Best Answer

This answer is a bit unlike the others, but I understood your question to be more related to "what is the logic behind adding/removing columns to/from the SELECT and GROUP BY". If it's out of place, please let me know and I'll delete.

It's not really a quirk, it's just a basic rule of aggregation. If I'm counting people that live at my house (3 total), I count people that live at my address. So I can say

SELECT Address
      ,COUNT(PersonID) AS PeopleCount
FROM Household
GROUP BY Address

That will give me the number of people at my address.

If I try to do it by name:

SELECT Address
       Name
      ,COUNT(PersonID) AS PeopleCount
FROM Household
GROUP BY Address
        ,Name

Then it will give me a "1" value in People count as there is only 1 person per address per name in my household.

If I had 2 people with the same name living at my house, then the "PeopleCount" value for those two people would be "2" until I found something else to add to make the records unique.

So the reason you add items to the SELECT/GROUP BY is to perform aggregates based on UNIQUE records. The more columns you add to the SELECT statement, the more unique the records are.