Efficiently finding distinct values

oracle

I have a number of tables with the primary key (month, year, number) and differing cardinalities differ somewhat. For the tuple (month, year) the history doesn't go back very far, this will probably not grow beyond 50 in the very long term. For every (month, year) tuple there are not more than 2 million unique numbers. I want to know which combinations of month and years are available. I do this using this query:

select month, year from table group by month, year

This returns the correct result but does not seem to be very efficient. What is an efficient way to obtain this result (utilizing the unique index)?

The tuning advisor suggests to add an index on month-year for this query but this seems wasteful because a larger index is already available.

Best Answer

You may be able to use a variation of the following technique - which forces repeated 'MIN/MAX' range scans:

Assumptions

You can produce a list of all possible year/month combinations
number is not null (which it can't be as it is in the PK, but I mention it as there is a way of working around if nulls are permitted)

testbed:

create table foo(month, year, num, primary key(month, year, num)) as
with m as ( select extract(month from d) as month, extract(year from d) as year
            from (select add_months(sysdate,1-level) as d from dual connect by level<50) )
select month, year, num
from m cross join 
     (select level as num from dual connect by level<100000 order by dbms_random.random());

normal query:

select distinct month, year from foo;
--gets=11656

min/max technique:

with m as ( select extract(month from d) as month, extract(year from d) as year
            from (select add_months(sysdate,1-level) as d from dual connect by level<50) )
select month, year, decode(( select min(num)
                             from foo
                             where month=m.month and year=m.year )
                           ,null, 'N', 'Y') as has_data_yn
from m;
--gets=294

Some explanation in response to comments:

In each case (the testbed and the min/max query), the subquery factoring clause just generated a list of (year, month) tuples:

with m as ( select extract(month from d) as month, extract(year from d) as year
            from (select add_months(sysdate,1-level) as d from dual connect by level<50) )
select * from m;
/*
MONTH                  YEAR                   
---------------------- ---------------------- 
1                      2012                   
12                     2011                   
11                     2011                   
10                     2011           
...
...
*/

Then the technique uses a subquery in the select clause to check if any rows are present for the (month, year) - this subquery necessarily must only produce at most 1 row:

select min(num)
from foo
where month=m.month and year=m.year;

This is very quick because it makes use of the ordered nature of the PK - however it needs to be executed once for each month - if there are millions of rows for each month that makes sense, but not if there are few enough to fit in a small number of block.

Related Solutions

Oracle 11g – Best Way to Delete Very Large Recordsets

The logic with 'A' and 'B' might be "hidden" behind a virtual column on which you could do the partitioning:

alter session set nls_date_format = 'yyyy-mm-dd';
drop   table tq84_partitioned_table;

create table tq84_partitioned_table (
  status varchar2(1)          not null check (status in ('A', 'B')),
  date_a          date        not null,
  date_b          date        not null,
  date_too_old    date as
                       (  case status
                                 when 'A' then add_months(date_a, -7*12)
                                 when 'B' then            date_b
                                 end
                        ) virtual,
  data            varchar2(100) 
)
partition   by range  (date_too_old) 
( 
  partition p_before_2000_10 values less than (date '2000-10-01'),
  partition p_before_2000_11 values less than (date '2000-11-01'),
  partition p_before_2000_12 values less than (date '2000-12-01'),
  --
  partition p_before_2001_01 values less than (date '2001-01-01'),
  partition p_before_2001_02 values less than (date '2001-02-01'),
  partition p_before_2001_03 values less than (date '2001-03-01'),
  partition p_before_2001_04 values less than (date '2001-04-01'),
  partition p_before_2001_05 values less than (date '2001-05-01'),
  partition p_before_2001_06 values less than (date '2001-06-01'),
  -- and so on and so forth..
  partition p_ values less than (maxvalue)
);

insert into tq84_partitioned_table (status, date_a, date_b, data) values 
('B', date '2008-04-14', date '2000-05-17', 
 'B and 2000-05-17 is older than 10 yrs, must be deleted');


insert into tq84_partitioned_table (status, date_a, date_b, data) values 
('B', date '1999-09-19', date '2004-02-12', 
 'B and 2004-02-12 is younger than 10 yrs, must be kept');


insert into tq84_partitioned_table (status, date_a, date_b, data) values 
('A', date '2000-06-16', date '2010-01-01', 
 'A and 2000-06-16 is older than 3 yrs, must be deleted');


insert into tq84_partitioned_table (status, date_a, date_b, data) values 
('A', date '2009-06-09', date '1999-08-28', 
 'A and 2009-06-09 is younger than 3 yrs, must be kept');

select * from tq84_partitioned_table order by date_too_old;

-- drop partitions older than 10 or 3 years, respectively:

alter table tq84_partitioned_table drop partition p_before_2000_10;
alter table tq84_partitioned_table drop partition p_before_2000_11;
alter table tq84_partitioned_table drop partition p2000_12;

select * from tq84_partitioned_table order by date_too_old;

Expand sparse table with self outer join on distinct values

I don't know if this is any more straightforward, but I would change the syntax to use ANSI JOIN syntax instead of the (+)

select t1.g1,
  t1.g2,
  t2.x
from
(
  select distinct t1.g1, t2.g2
  from yourtable t1
  cross join (select g2 from yourtable) t2
) t1
left join yourtable t2
  on t1.g1 = t2.g1
  and t1.g2 = t2.g2
order by t1.g1, t1.g2

See SQL Fiddle with Demo

Best Answer

Related Solutions

Oracle 11g – Best Way to Delete Very Large Recordsets

Expand sparse table with self outer join on distinct values

Related Question