How to group results in batches of consecutive rows with the same ‘module’

oracle

I'm looking for a seemingly very simple SELECT.

I have a logging table LOGGING of which three columns are of interest to me: MODULE (text), PDATE (date), PTEXT (text).

Actually, I care only about the first two columns, i.e. which module has been called at all (which is the name of a procedure in a PL/SQL package). The date I only need for ordering (ASC) the result, and one could also use the ID column for the same purpose, which is an auto-incremented integer value (Oracle: sequence.nextval).

Since the logging produces waaaaayyyyy too much data, tens to hundreds of PTEXT entries for a single MODULE, I would like to condense the MODULE rows, which seems to call for a simple GROUP BY.

Maybe this illustrates the issue:

I start with the inner query below, which gives me the correct sequence of "module" calls, but does not aggregate them (example 1 above). So my idea was okay, I get the correct result in the correct order – so if I now select just the one column I'm interested in and use "group by" on this ordered(!) data I should get what I want – but adding the "group by" below gives me a COMPLETELY different order. Apparently my knowledge about what "group by" does is inadequate.

SELECT module FROM
(
  SELECT module, pdate FROM logging
  ORDER BY pdate asc
)
--GROUP BY module

Result without "GROUP BY" (first 12 lines):

set_option_value
set_option_value
set_option_value
set_option_value
get_rights_usecase
get_rights_usecase
get_rights_usecase
get_rights_usecase
get_rights_usecase
get_rights_usecase
get_fun_AppOptionsSingleVal
get_rights_usecase
...

Result incl. "GROUP BY":

get_fun_AppOptionsSingleVal
get_option_value
get_parameter
get_usecase_customer
get_kungesId_FY_cusper
fill_missing_products
update_chart_budget_uc1
get_Berein
set_hierarchy
get_element_num_value
icas_dyn_create_chart_table
get_xmlTemplate
...

As one can see the "GROUP BY" produces a completely different order.

What I WOULD HAVE WANTED:

set_option_value
get_rights_usecase
get_fun_AppOptionsSingleVal
get_rights_usecase
...

Basically, what I want can be achieved very simply by taking the text output of

  SELECT module FROM logging
  ORDER BY pdate asc

and pipe it through "uniq" on the Unix command line. Now I'm just curious why this isn't as simple in SQL.

Best Answer

The question is which of the pdates for a module do you want to use for ordering purposes?

If you are happy to use the largest, you could use:

select module from logging order by max(pdate); 

--- EDIT

I think I get it now - you probably want something like this:

select module
from( select module, 
             sum(module_step) over (order by pdate rows unbounded preceding) 
                 as batch_number, 
             pdate
      from( select module, 
                   case lag(module) over (order by pdate) when module then 0 
                        else 1 end as module_step, 
                   pdate from logging) )
group by module, batch_number
order by batch_number;