Sqlite: strange behavor on index , about group by operation

sqlite

CREATE TABLE [newtable](
  [id], 
  [HLevel], 
  [LeftBower], 
  [RightBower], 
  [NodeNumber], 
  [NodeCount], 
  [SortPath], 
  [ID:1] INT, 
  [parent] INT, 
  [name] TEXT, 
  [folder] NUM, 
  [size] INT);

CREATE TABLE [tally]([N]);

This is the index I used:
enter image description here

explain query plan select pid ,newtable.size from (
select id as pid, leftbower as lft, rightbower as rgt from newtable where parent =1 and folder = 1
)
as base join newtable on newtable.id> base.lft and newtable.id<base.rgt

output:
id  parent  notused detail
5   0   0   SEARCH TABLE newtable USING INDEX vdsew (folder=?)
12  0   0   SEARCH TABLE newtable USING INDEX dd (id>? AND id<?)

but with group by:

explain query plan select pid ,newtable.size from (
select id as pid, leftbower as lft, rightbower as rgt from newtable where parent =1 and folder = 1
)
as base join newtable on newtable.id> base.lft and newtable.id<base.rgt group by pid;

output:
id  parent  notused detail
8   0   0   SCAN TABLE newtable
10  0   0   SEARCH TABLE newtable USING INDEX vdsew (folder=?)
23  0   0   USE TEMP B-TREE FOR GROUP BY

As rediculous consequence, If I want to get the group by result, create a temp table to save the result, then select from temp group by. This 2-step takes only about 200ms and the combination takes much much longer, depending on what index I am using. For instance, in this case(the indices I posted here).

This query:

create table temp as select pid ,newtable.size from (
select id as pid, leftbower as lft, rightbower as rgt from newtable where parent =1 and folder = 1
)

as base join newtable on newtable.id> base.lft and newtable.id<base.rgt ;
select pid, sum(size) from temp group by pid;

takes only 250ms to get the result, while

select pid ,sum(size) from (
select id as pid, leftbower as lft, rightbower as rgt from newtable where parent =1 and folder = 1
)

as base join newtable on newtable.id> base.lft and newtable.id<base.rgt 
group by pid

takes very very long time, depends on which Index I used, and got the same out. In this case, it freezes forever.(It is because it use folder index. After I delete folder index, the query will use parent index and the time is about 14seconds. Anyway, this is another question. Let's focus why the behavoir of using indecies by group by operation is so strange first.

Best Answer

It's probably because your index doesn't include the size field so it doesn't cover it so when aggregating over size with a GROUP BY the operation cost determined by the query optimizer thinks using a TEMP B-TREE is the better choice. See the Temporary Sorting B-Trees section of the "EXPLAIN QUERY PLAN" doc.

You can create an index on newtable that includes the size field as well and re-check the EXPLAIN. More information on a similar issue in this DBA.StackExchange answer.