I have a sidewalk_deficiency
table:
create table sidewalk_deficiency (
def_id number(10),
insp_id number(10),
def_length number(10,2));
insert into sidewalk_deficiency values (10,1,.5);
insert into sidewalk_deficiency values (11,1,1);
insert into sidewalk_deficiency values (12,1,1.5);
insert into sidewalk_deficiency values (13,2,2);
insert into sidewalk_deficiency values (14,2,2.5);
commit;
DEF_ID INSP_ID DEF_LENGTH
---------- ---------- ----------
10 1 .5
11 1 1
12 1 1.5
13 2 2
14 2 2.5
I can create a query that summarizes the sidewalk inspections by tallying up the deficiency lengths, per inspection:
select
insp_id,
sum(def_length) as def_length
from
sidewalk_deficiency
group by
insp_id
INSP_ID DEF_LENGTH
---------- ----------
1 3
2 4.5
This works as I had expected. Two rows are output; one for each insp_id
.
I understand that the above query does, in fact, provide the desired output. However, I don't really understand why it works the way it does.
Admittedly, when I write queries like this, it makes more sense to me to include fields such as def_length
in both the aggregate function and the group by
clause. To me, I know I want to collapse the def_length
into a sum, so it seems to me that I should group by it.
select
insp_id,
sum(def_length) as def_length
from
sidewalk_deficiency
group by
insp_id,
def_length --<<--Problem
INSP_ID DEF_LENGTH
---------- ----------
2 2.5
1 .5
1 1
2 2
1 1.5
Of course, as we can see, my thought process is flawed. When I include the def_length
field in the group by
, it produces more records than I had expected.
Forgive my ignorance, but why does the second query produce more results than the first? I would have thought that including the field in both the aggregate function and the group by
would have grouped the rows as I had planned.
Best Answer
This answer is a bit unlike the others, but I understood your question to be more related to "what is the logic behind adding/removing columns to/from the SELECT and GROUP BY". If it's out of place, please let me know and I'll delete.
It's not really a quirk, it's just a basic rule of aggregation. If I'm counting people that live at my house (3 total), I count people that live at my address. So I can say
That will give me the number of people at my address.
If I try to do it by name:
Then it will give me a "1" value in People count as there is only 1 person per address per name in my household.
If I had 2 people with the same name living at my house, then the "PeopleCount" value for those two people would be "2" until I found something else to add to make the records unique.
So the reason you add items to the SELECT/GROUP BY is to perform aggregates based on UNIQUE records. The more columns you add to the SELECT statement, the more unique the records are.