Why are the backups using so much space? ORA-19804

archive-logoracleoracle-12crman

I had a backup job that was failing like this:

RMAN-03002: failure of backup plus archivelog command at 10/20/2016 01:16:23
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 67108864 bytes disk space from 107374182400 limit

So, following some advice found elsewhere, I increased db_recovery_file_dest_size (doubling it from 100G to 200G) and the error stopped happening.

Now, less than a month later, it's happening again:

RMAN-03002: failure of backup plus archivelog command at 04/14/2017 01:16:08
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 67108864 bytes disk space from 214748364800 limit

The database isn't nearly that big. A full data pump export is less than 5G. And I can't believe it's changing so fast that a month of backups actually needs to contain that much extra information.

The database and the backup scripts were installed as part of a larger application. I haven't set all this up manually, but I do have the power to change it. The script looks like this:

run {
configure controlfile autobackup on;
configure controlfile autobackup format for device type disk to '%F';
configure retention policy to recovery window of  31 days;
allocate channel d1 type disk maxpiecesize=5G;
backup as compressed backupset incremental level 0 cumulative tag 'L0'database
plus archivelog delete all input;
restore database validate;
release channel d1;
delete noprompt obsolete;
restore database check logical validate;
backup validate database;
restore archivelog from time 'SYSDATE-14' validate;
host 'copy E:\oracle\product\12.1.0\dbhome_1\NETWORK\ADMIN\tnsnames.ora E:\oradata\fast_recovery_area\';
host 'copy E:\oracle\product\12.1.0\dbhome_1\NETWORK\ADMIN\sqlnet.ora E:\oradata\fast_recovery_area\';
host 'copy E:\oracle\product\12.1.0\dbhome_1\NETWORK\ADMIN\listener.ora E:\oradata\fast_recovery_area\';
host 'copy E:\oracle\product\12.1.0\dbhome_1\database\pwd*.ora E:\oradata\fast_recovery_area\';
}
list incarnation of database;
list backup summary;
list backup by file;
list recoverable backup of database;
report schema;
report need backup redundancy=3;
report need backup recovery window of 3 days;

It runs weekly. There is a daily script that is identical except it has level 1 and tag L1 where the weekly script has level 0 and tag L0.

I have the full output from this script so I can fill in other details as needed.

Looking in the fast_recovery_area, the biggest subdirectory is ARCHIVELOG, with 516G of stuff in it. The files go back 4 months, which is much longer than the 3 day and 31 day windows mentioned in the script. I don't understand why the older files haven't been deleted. If it's complaining about hitting a 200G limit, I don't understand how it managed to grow to 516G in the first place. Obviously I've missed something fundamental about db_recovery_file_dest_size.

The BACKUPSET subdirectory is not as big, but also has some very old files in it, going back 2 years. I don't understand why those haven't been deleted either.

Most confusing of all is the last command in the backup script:

report need backup recovery window of 3 days;

The output is an empty list. If I understand correctly, that means I can recover everything to the state it was in 3 days ago. That's already surprising, since I've been getting these errors from the backup script for 2 or 3 days now. More surprising is this:

report need backup recovery window of 9999 days;

What I wanted it to do was say something along the lines of "You can't recover anything from 9999 days ago, you ridiculous person." But it said nothing. I have no idea how to interpret this.

How can I get rid of the old junk and get backups to start behaving reasonably?

Additional information

RMAN> report obsolete;
RMAN retention policy will be applied to the command
RMAN retention policy is set to recovery window of 31 days
no obsolete backups found

Additional 2

RMAN> list expired archivelog all;
specification does not match any archived log in the repository

Additional 3

Since this question was posted, the disk got close to filling up, and rman was still not deleting anything. There was no other choice but to delete the oldest files manually. rman didn't seem to notice this. It still says

ORA-19804: cannot reclaim 67108864 bytes disk space from 214748364800 limit

even though the actual amount of space being used by fast_recovery_area is well below that 200G limit (about 127G used so it should have 73G left, but it says it can't find 64M! What is going on?)

I got a suggestion elsewhere to increase control_file_record_keep_time and run catalog recovery area. The result was:

RMAN> alter system set control_file_record_keep_time = 39;
Statement processed
RMAN> catalog recovery area;
List of files in Recovery Area not managed by the database
==========================================================

(… list of every file in the fast_recovery_area/ARCHIVELOG directory …)

number of files not managed by recovery area is 175, totaling 55.28GB

I don't know what this means, but it didn't stop the ORA-19804 from coming again on the next backup attempt.

Best Answer

I use the following command in RMAN at the end of the backup:

delete noprompt archivelog until time 'SYSDATE-1' backed up 1 times to device type 'SBT-TAPE';

All files that were backed up to tape and are older than 1 day are removed from the directory and freeing up the db_recovery_file_dest.

When I do a restore the tape storage delivers to me the archivelogs that RMAN needs.