Efficiently finding distinct values

oracle

I have a number of tables with the primary key (month, year, number) and differing cardinalities differ somewhat. For the tuple (month, year) the history doesn't go back very far, this will probably not grow beyond 50 in the very long term. For every (month, year) tuple there are not more than 2 million unique numbers. I want to know which combinations of month and years are available. I do this using this query:

select month, year from table group by month, year

This returns the correct result but does not seem to be very efficient. What is an efficient way to obtain this result (utilizing the unique index)?

The tuning advisor suggests to add an index on month-year for this query but this seems wasteful because a larger index is already available.

Best Answer

You may be able to use a variation of the following technique - which forces repeated 'MIN/MAX' range scans:

Assumptions

  1. You can produce a list of all possible year/month combinations
  2. number is not null (which it can't be as it is in the PK, but I mention it as there is a way of working around if nulls are permitted)

testbed:

create table foo(month, year, num, primary key(month, year, num)) as
with m as ( select extract(month from d) as month, extract(year from d) as year
            from (select add_months(sysdate,1-level) as d from dual connect by level<50) )
select month, year, num
from m cross join 
     (select level as num from dual connect by level<100000 order by dbms_random.random());

normal query:

select distinct month, year from foo;
--gets=11656

min/max technique:

with m as ( select extract(month from d) as month, extract(year from d) as year
            from (select add_months(sysdate,1-level) as d from dual connect by level<50) )
select month, year, decode(( select min(num)
                             from foo
                             where month=m.month and year=m.year )
                           ,null, 'N', 'Y') as has_data_yn
from m;
--gets=294

Some explanation in response to comments:

In each case (the testbed and the min/max query), the subquery factoring clause just generated a list of (year, month) tuples:

with m as ( select extract(month from d) as month, extract(year from d) as year
            from (select add_months(sysdate,1-level) as d from dual connect by level<50) )
select * from m;
/*
MONTH                  YEAR                   
---------------------- ---------------------- 
1                      2012                   
12                     2011                   
11                     2011                   
10                     2011           
...
...
*/

Then the technique uses a subquery in the select clause to check if any rows are present for the (month, year) - this subquery necessarily must only produce at most 1 row:

select min(num)
from foo
where month=m.month and year=m.year;

This is very quick because it makes use of the ordered nature of the PK - however it needs to be executed once for each month - if there are millions of rows for each month that makes sense, but not if there are few enough to fit in a small number of block.