The logic with 'A' and 'B' might be "hidden" behind a virtual column on which you could do the partitioning:
alter session set nls_date_format = 'yyyy-mm-dd';
drop table tq84_partitioned_table;
create table tq84_partitioned_table (
status varchar2(1) not null check (status in ('A', 'B')),
date_a date not null,
date_b date not null,
date_too_old date as
( case status
when 'A' then add_months(date_a, -7*12)
when 'B' then date_b
end
) virtual,
data varchar2(100)
)
partition by range (date_too_old)
(
partition p_before_2000_10 values less than (date '2000-10-01'),
partition p_before_2000_11 values less than (date '2000-11-01'),
partition p_before_2000_12 values less than (date '2000-12-01'),
--
partition p_before_2001_01 values less than (date '2001-01-01'),
partition p_before_2001_02 values less than (date '2001-02-01'),
partition p_before_2001_03 values less than (date '2001-03-01'),
partition p_before_2001_04 values less than (date '2001-04-01'),
partition p_before_2001_05 values less than (date '2001-05-01'),
partition p_before_2001_06 values less than (date '2001-06-01'),
-- and so on and so forth..
partition p_ values less than (maxvalue)
);
insert into tq84_partitioned_table (status, date_a, date_b, data) values
('B', date '2008-04-14', date '2000-05-17',
'B and 2000-05-17 is older than 10 yrs, must be deleted');
insert into tq84_partitioned_table (status, date_a, date_b, data) values
('B', date '1999-09-19', date '2004-02-12',
'B and 2004-02-12 is younger than 10 yrs, must be kept');
insert into tq84_partitioned_table (status, date_a, date_b, data) values
('A', date '2000-06-16', date '2010-01-01',
'A and 2000-06-16 is older than 3 yrs, must be deleted');
insert into tq84_partitioned_table (status, date_a, date_b, data) values
('A', date '2009-06-09', date '1999-08-28',
'A and 2009-06-09 is younger than 3 yrs, must be kept');
select * from tq84_partitioned_table order by date_too_old;
-- drop partitions older than 10 or 3 years, respectively:
alter table tq84_partitioned_table drop partition p_before_2000_10;
alter table tq84_partitioned_table drop partition p_before_2000_11;
alter table tq84_partitioned_table drop partition p2000_12;
select * from tq84_partitioned_table order by date_too_old;
The best solution to this problem is nearly always method 2, but with two tables UNION ALL instead of LEFT JOIN together (one to hold the deleted, one for the active). There are several reasons why this is superior:
- You can maintain separate statistics on active and deleted rows. Assumption: you will have more deleted than active over time. This means that the "active rows" table stays small and doing fullscans is easier. It also makes things like index rebuilds (if you do them) faster on the active rows.
- Statistics of old rows will not affect news rows. This helps you avoid skew problems.
- Each table can live on it's own filegroup. This means you can move deleted rows off to cheaper storage.
- You can have different indexing/partitioning strategies on the deleted rows vs. the active ones. For example, you may choose to use a column store index on the old rows if they are read via a scan often, but changed very rarely.
- The deleted table can be taken offline with a table SWITCH (for maintenance) without disrupting the system too much.
The two table solution can be implemented with a DELETE instead of trigger that moves the rows to deleted instead of deleting them.
The only downside of solution 2 is that you will need to modify your queries to distinguish between deleted and not deleted rows and tables. This can be done via a view, but it is safer to avoid this if possible. The view can confuse the optimiser and there are cases where using a view instead of a table gives you horrible execution plans. If you do use a view, you should add an `IsDeleted' column to both tables and put a check constraint on it. This ensure that the optimiser will not try to seek BOTH table for every query (which would double your IOPS).
For completeness, and to answer your last question, there are some other ways to "solve" this:
Method 4: Add an IsDeleted
column and partition on this column. Drawback: Does not allow separate statistics on the two partitions (statistics are table, not partition based, in SQL Server)
Method 5: Use a filtered index to apply different indexing (and secondary storage) on deleted and active rows. Drawback: Good luck getting the optimiser to behave properly with this method.
Best Answer
Using the second method will result in less locking, transaction log usage and Tempdb resources, because SQL Server doesn't have to perform all of the delete operations in one transaction.
However, instead of the "while exists" statement, you can use a variable that will get the value of @@rowcount after the delete. This can help a little more with performance.