CREATE TABLE [newtable](
[id],
[HLevel],
[LeftBower],
[RightBower],
[NodeNumber],
[NodeCount],
[SortPath],
[ID:1] INT,
[parent] INT,
[name] TEXT,
[folder] NUM,
[size] INT);
CREATE TABLE [tally]([N]);
This is the index I used:
explain query plan select pid ,newtable.size from (
select id as pid, leftbower as lft, rightbower as rgt from newtable where parent =1 and folder = 1
)
as base join newtable on newtable.id> base.lft and newtable.id<base.rgt
output:
id parent notused detail
5 0 0 SEARCH TABLE newtable USING INDEX vdsew (folder=?)
12 0 0 SEARCH TABLE newtable USING INDEX dd (id>? AND id<?)
but with group by:
explain query plan select pid ,newtable.size from (
select id as pid, leftbower as lft, rightbower as rgt from newtable where parent =1 and folder = 1
)
as base join newtable on newtable.id> base.lft and newtable.id<base.rgt group by pid;
output:
id parent notused detail
8 0 0 SCAN TABLE newtable
10 0 0 SEARCH TABLE newtable USING INDEX vdsew (folder=?)
23 0 0 USE TEMP B-TREE FOR GROUP BY
As rediculous consequence, If I want to get the group by result, create a temp table to save the result, then select from temp group by. This 2-step takes only about 200ms and the combination takes much much longer, depending on what index I am using. For instance, in this case(the indices I posted here).
This query:
create table temp as select pid ,newtable.size from (
select id as pid, leftbower as lft, rightbower as rgt from newtable where parent =1 and folder = 1
)
as base join newtable on newtable.id> base.lft and newtable.id<base.rgt ;
select pid, sum(size) from temp group by pid;
takes only 250ms to get the result, while
select pid ,sum(size) from (
select id as pid, leftbower as lft, rightbower as rgt from newtable where parent =1 and folder = 1
)
as base join newtable on newtable.id> base.lft and newtable.id<base.rgt
group by pid
takes very very long time, depends on which Index I used, and got the same out. In this case, it freezes forever.(It is because it use folder index. After I delete folder index, the query will use parent index and the time is about 14seconds. Anyway, this is another question. Let's focus why the behavoir of using indecies by group by operation is so strange first.
Best Answer
It's probably because your index doesn't include the
size
field so it doesn't cover it so when aggregating oversize
with aGROUP BY
the operation cost determined by the query optimizer thinks using aTEMP B-TREE
is the better choice. See the Temporary Sorting B-Trees section of the "EXPLAIN QUERY PLAN" doc.You can create an index on
newtable
that includes thesize
field as well and re-check theEXPLAIN
. More information on a similar issue in this DBA.StackExchange answer.