How to Create Row Groups Using Subgroup Limits in SQLite

ctesqlite

I had originally posted this in SO but reasoned that it would be more appropriate here.

I have a view in an sqlite database such as this:

Original view

SQL Fiddle
(http://www.sqlfiddle.com/#!5/ae95b/1)

This represents a list of boxes and items that go into each box. The items are always ordered by box_start and the ranges box_start and box_end never overlap.

As shown in the first row there's a box which can store items with codes from 1 inclusive to 3 inclusive. For example the items that would go in the box A would be 'a' and 'b'.
The box-items grouping is denoted by the coloring, rows with the same color means that those are a group.
There are also items without a designated box (box label is null), such as items 'c', 'f', 'g', 'h'.

I need to write a query, if it is possible at all, to create ad-hoc boxes for items without a box and group contiguous items into the same box as follows:

Desired result

As seen in the image the item 'c' and its corresponding box were labeled with '4-4' and the previously unassigned items 'f', 'g', 'h' are now grouped under the same box labeled '8-10' which corresponds to 'min(box_start of f,g,h)-max(box_end of f,g,h)'

I not sure how to do this in SQLite. I thought about using somekind of recursive query with CTEs but don't have a clue how to do it.

Best Answer

After some work I have the following query:

select
    min(box_start) as box_start,
    box_end,
    box_label,
    is_box,
    item_code,
    item
from
    (
        select
            box_start,
            box_end,
            box_label,
            is_box,
            item_code,
            item
        from
            table1
        where
            box_label is not null

        union all

        select
            table1.box_start as box_start,
            table1.box_end as box_end,
            intervals.A || '-' || intervals.B as box_label,
            table1.is_box as is_box,
            table1.item_code as item_code,
            table1.item as item
        from
            (
                select
                    box_start,
                    box_end,
                    box_label,
                    is_box,
                    A,
                    B,
                    max(max_interval_size) as max_interval_size
                from
                    (
                        select
                            box_start,
                            box_end,
                            box_label,
                            is_box,
                            A,
                            B,
                            max(interval_size) as max_interval_size
                        from
                            (
                                select
                                  fixed_table.box_start as box_start,
                                  fixed_table.box_end as box_end,
                                  fixed_table.box_label as box_label,
                                  fixed_table.is_box as is_box,
                                  fixed_table.box_start as A,
                                  windowed_table.box_end as B,
                                  (windowed_table.box_end - fixed_table.box_start) as interval_size
                                from
                                  table1 fixed_table
                                  join table1 windowed_table on
                                    fixed_table.box_start <= windowed_table.box_end
                                where 
                                  interval_size >= 0
                                  and fixed_table.box_label is null
                                  and windowed_table.box_label is null
                                  and fixed_table.is_box = 'FALSE'
                                  and windowed_table.is_box = 'FALSE'

                                except

                                select
                                  without_a_box.*
                                from
                                    (
                                        select
                                          fixed_table.box_start as box_start,
                                          fixed_table.box_end as box_end,
                                          fixed_table.box_label as box_label,
                                          fixed_table.is_box as is_box,
                                          fixed_table.box_start as A,
                                          windowed_table.box_end as B,
                                          (windowed_table.box_end - fixed_table.box_start) as interval_size
                                        from
                                          table1 fixed_table
                                          join table1 windowed_table on
                                            fixed_table.box_start <= windowed_table.box_end
                                        where 
                                          interval_size >= 0
                                          and fixed_table.box_label is null
                                          and windowed_table.box_label is null
                                          and fixed_table.is_box = 'FALSE'
                                          and windowed_table.is_box = 'FALSE'
                                    ) as without_a_box
                                    ,
                                    (
                                        select distinct
                                            with_box.box_start as start_with_box
                                         from
                                            table1 with_box
                                         where
                                            with_box.is_box = 'FALSE'
                                            and with_box.box_label is not null
                                    ) as items_inside_a_box
                                where
                                    items_inside_a_box.start_with_box > without_a_box.A
                                    and items_inside_a_box.start_with_box < without_a_box.B
                            ) as without_intervals_that_intersect_boxed_items
                        group by
                            A
                    ) as final
                group by
                    B
            ) as intervals
            join table1 on
                table1.box_start >= intervals.A
                and table1.box_end <= intervals.B
                and table1.box_label is null
    )
group by
    box_label,
    is_box,
    item_code,
    item
order by
    box_start,
    item_code

SQL Fiddle: http://www.sqlfiddle.com/#!7/4a643e/142

Even though it seems to do its job I am not sure is correct in all cases and if it won't be a performance bottleneck.

I hope someone has a better solution