The logic with 'A' and 'B' might be "hidden" behind a virtual column on which you could do the partitioning:
alter session set nls_date_format = 'yyyy-mm-dd';
drop table tq84_partitioned_table;
create table tq84_partitioned_table (
status varchar2(1) not null check (status in ('A', 'B')),
date_a date not null,
date_b date not null,
date_too_old date as
( case status
when 'A' then add_months(date_a, -7*12)
when 'B' then date_b
end
) virtual,
data varchar2(100)
)
partition by range (date_too_old)
(
partition p_before_2000_10 values less than (date '2000-10-01'),
partition p_before_2000_11 values less than (date '2000-11-01'),
partition p_before_2000_12 values less than (date '2000-12-01'),
--
partition p_before_2001_01 values less than (date '2001-01-01'),
partition p_before_2001_02 values less than (date '2001-02-01'),
partition p_before_2001_03 values less than (date '2001-03-01'),
partition p_before_2001_04 values less than (date '2001-04-01'),
partition p_before_2001_05 values less than (date '2001-05-01'),
partition p_before_2001_06 values less than (date '2001-06-01'),
-- and so on and so forth..
partition p_ values less than (maxvalue)
);
insert into tq84_partitioned_table (status, date_a, date_b, data) values
('B', date '2008-04-14', date '2000-05-17',
'B and 2000-05-17 is older than 10 yrs, must be deleted');
insert into tq84_partitioned_table (status, date_a, date_b, data) values
('B', date '1999-09-19', date '2004-02-12',
'B and 2004-02-12 is younger than 10 yrs, must be kept');
insert into tq84_partitioned_table (status, date_a, date_b, data) values
('A', date '2000-06-16', date '2010-01-01',
'A and 2000-06-16 is older than 3 yrs, must be deleted');
insert into tq84_partitioned_table (status, date_a, date_b, data) values
('A', date '2009-06-09', date '1999-08-28',
'A and 2009-06-09 is younger than 3 yrs, must be kept');
select * from tq84_partitioned_table order by date_too_old;
-- drop partitions older than 10 or 3 years, respectively:
alter table tq84_partitioned_table drop partition p_before_2000_10;
alter table tq84_partitioned_table drop partition p_before_2000_11;
alter table tq84_partitioned_table drop partition p2000_12;
select * from tq84_partitioned_table order by date_too_old;
Your ix_hugetable
looks quite useless because:
- it is the clustered index (PK)
- the INCLUDE makes no difference because a clustered index INCLUDEs all non-key columns (non-key values at lowest leaf = INCLUDEd = what a clustered index is)
In addition:
- added or fk should be first
- ID is first = not much use
Try changing the clustered key to (added, fk, id)
and drop ix_hugetable
. You've already tried (fk, added, id)
. If nothing else, you'll save a lot of disk space and index maintenance
Another option might be to try the FORCE ORDER hint with table order boh ways and no JOIN/INDEX hints. I try not to use JOIN/INDEX hints personally because you remove options for the optimiser. Many years ago I was told (seminar with a SQL Guru) that FORCE ORDER hint can help when you have huge table JOIN small table: YMMV 7 years later...
Oh, and let us know where the DBA lives so we can arrange for some percussion adjustment
Edit, after 02 Jun update
The 4th column is not part of the non-clustered index so it uses the clustered index.
Try changing the NC index to INCLUDE the value column so it doesn't have to access the value column for the clustered index
create nonclustered index ix_hugetable on dbo.hugetable (
fk asc, added asc
) include(value)
Note: If value is not nullable then it is the same as COUNT(*)
semantically. But for SUM it need the actual value, not existence.
As an example, if you change COUNT(value)
to COUNT(DISTINCT value)
without changing the index it should break the query again because it has to process value as a value, not as existence.
The query needs 3 columns: added, fk, value. The first 2 are filtered/joined so are key columns. value is just used so can be included. Classic use of a covering index.
Best Answer
The problem of deleting large portions of a table is far from a trivial problem. The best approach, by far, is partitioning. A daily partition scheme with a sliding window is really a magic bullet for this problem, see How to Implement an Automatic Sliding Window in a Partitioned Table.
If you cannot afford partitioning (eg. non-enterprise license on site) then I would recommend clustering by
schDateCreated
. If you need primary key on the GUID+smallint then move it to non-clustered. Delete in batches (eg.TOP 10000
), in a loop, to reduce pressure on the log. Consider updating stats after the operation.