Oracle9i User-Managed Backup and Recovery Guide Release 2 (9.2) Part Number A96572-01 |
|
This chapter describes how to troubleshoot user-managed media recovery, and includes the following topics:
Table 5-1 describes potential problems that can occur during media recovery.
Problem | Description |
---|---|
Missing or misnamed archived log |
Recovery stops because Oracle cannot find the archived log recorded in the control file. |
When you attempt to open the database, error |
This error commonly occurs because:
|
Redo record problems |
Two possible cases are as follows:
|
Corrupted archived logs |
Logs may be corrupted while they are stored on or copied between storage systems. If |
Archived logs with incompatible parallel redo format |
If you enable the parallel redo feature in Oracle9i Release 2 (9.2), then Oracle generates redo logs in a new format. Prior releases of Oracle are unable to apply parallel redo logs. However, Oracle9i Release 1 (9.0.1) can detect the parallel redo format and indicate the inconsistency with the following error message: See Also: Oracle9i Database Performance Tuning Guide and Reference to learn about the parallel redo feature |
Corrupted data blocks |
A datafile backup may have contained a corrupted data block, or the data block may become corrupted either during recovery or when it was copied to the backup. If checksums are being used, then Oracle signals a checksum error. Otherwise, the problem may also appear as a redo corruption. |
Random problems |
Memory corruptions and other transient problems can occur during recovery. |
The symptoms of media recovery problems are usually external or internal errors signaled during recovery. For example, an external error indicates that a redo block or a data block has failed checksum verification checks. Internal errors can be caused by either bugs in Oracle or errors arising from the underlying operating system and hardware.
If media recovery encounters a problem while recovering a database backup, whether it is a stuck recovery problem or a problem during redo application, Oracle always stops and leaves the datafiles undergoing recovery in a consistent state, that is, at an SCN preceding the failure. You can then do one of the following:
RESETLOGS
option, as long as the requirements for opening RESETLOGS
have been met (as described in "Opening the Database After User-Managed Media Recovery"). Note that the RESETLOGS
restrictions apply to opening the standby database as well, because a standby database is updated by a form of media recovery.In general, opening the database read-only or opening with the RESETLOGS
option require all online datafiles to be recovered to the same SCN. If this requirement is not met, then Oracle may signal ORA-1113
or other errors when you attempt to open. Some common causes of ORA-1113
are described in Table 5-1.
The basic methodology for responding to media recovery problems occurs in the following phases:
RESETLOGS
option if you are recovering a whole database backup. If you have performed serial media recovery, then the database contains all the changes up to but not including the changes at the SCN where the corruption occurred. No changes from this SCN onward are in the recovered part of the database. If you have restored online backups, opening RESETLOGS
succeeds only if you have recovered through all the ALTER
...
END
BACKUP
operations in the redo stream.See Also:
Oracle9i Recovery Manager User's Guide to learn about block media recovery |
If media recovery encounters a problem, then obtain as much information as possible after recovery halts. You do not want to waste time fixing the wrong problem, which may in fact make matters worse.
The goal of this initial investigation is to determine whether the problem is caused by incorrect setup, corrupted logs, corrupted data blocks, memory corruption, or other problems. If you see a checksum error on a data block, then the data block is corrupted. If you see a checksum error on a redo log block, then the redo log is corrupted.
Sometimes the cause of a recovery problem can be difficult to determine. Nevertheless, the methods in this chapter allow you to quickly recover a database sometimes even when you do not completely understand the cause of the problem.
To investigate media recovery problems:
alert.log
to see whether the error messages give general information about the nature of the problem. For example, does the alert_
SID
.log
indicate any checksum failures? Does the alert_
SID
.log
indicate that media recovery may have to corrupt data blocks in order to continue?Depending on the type of media recovery problem you suspect, you have different solutions at your disposal. You can try one or a combination of the methods described in Table 5-2. Note that these methods are fairly safe: in almost all cases, they should not cause any damage to the database.
If you suspect . . . | Then . . . |
---|---|
Missing/misnamed archived logs |
Determine whether you entered the correct filename. If you did, then check to see whether the log is missing from the operating system. If it is missing, and you have a backup, then restore the backup and apply the log. If you do not have a backup, then if possible perform incomplete recovery up to the point of the missing log. |
|
Review the causes of this error in Table 5-1. Make sure that all read/write datafiles requiring recovery are online. If you use a backup control file for recovery, then the control file and datafiles must be at a consistent SCN for the database to be opened. If you do not have the necessary redo, then you must re-create the control file. |
Corrupt archived logs |
The log is corrupted if the checksum verification on the log redo block fails. If The |
Archived logs with incompatible parallel redo format |
If you are running an Oracle release prior to Oracle9i Release 2, and if you are attempting to apply redo logs created with the parallel redo format, then you must do the following steps:
See Also: Oracle9i Database Performance Tuning Guide and Reference to learn about the parallel redo feature |
Memory corruption or transient problems |
Shut down the database and restart recovery. Sometimes this tactic fixes the problem. Oracle should leave the database in a consistent state if the second attempt also fails. |
Corrupt data blocks |
Restore and recover the datafile again with user-managed methods, or restore and recover individual data blocks with the RMAN Note that a data block is corrupted if the checksum verification on the data block fails. If |
If you cannot fix the problem with the methods described in Table 5-2, then there may be no easy way to fix the problem without losing data. You have these options:
RESETLOGS
option (for whole database recovery). This solution discards all changes after the point where the redo problem occurred, but guarantees a logically consistent database.alert_
SID
.log
indicates that recovery can continue if it is allowed to corrupt a data block, which should be the case for most recovery problems. This option is best if it is important to bring up the database quickly and recover all changes. If you are contemplating this option as a last resort, then proceed to "Deciding Whether to Allow Recovery to Corrupt Blocks: Phase 3".
See Also:
Oracle9i Recovery Manager User's Guide to learn how to perform block media recovery with the |
When media recovery encounters a problem, the alert_
SID
.log
may indicate that recovery can continue if it is allowed to corrupt the data block causing the problem. The alert_
SID
.log
always contains information about the block: its block type, block address, the tablespace it belongs to, and so forth. For blocks containing user data, the log may also report the data object number.
In this case, Oracle can proceed with recovery if it is allowed to mark the problem block as corrupt. Nevertheless, this response is not always advisable. For example, if the block is an important block in the SYSTEM
tablespace, marking the block as corrupt can eventually prevent you from opening the recovered database. Another consideration is whether the recovery problem is isolated. If this problem is followed immediately by many other problems in the redo stream, then you may want to open the database with the RESETLOGS
option.
For a block containing user data, you can usually query the database to find out which object or table owns this block. If the database is not open, then you should be able to open the database read-only, even if you are recovering a whole database backup. The following example cancels recovery and opens read-only:
CANCEL ALTER DATABASE OPEN READ ONLY;
Assume that the data object number reported in the alert_
SID
.log
is 8031
. You can determine the owner, object name, and object type by issuing this query:
SELECT OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_TYPE FROM DBA_OBJECTS WHERE DATA_OBJECT_ID = 8031;
To determine whether a recovery problem is isolated, you can run a diagnostic trial recovery, which scans the redo stream for problems but does not actually make any changes to the recovered database. If a trial recovery discovers any recovery problems, it reports them in the alert_
SID
.log
. You can use the RECOVER
...
TEST
statement to invoke trial recovery.
After you have done these investigations, you can follow the guidelines in Table 5-3 to decide whether to allow recovery to corrupt blocks.
See Also:
"Performing Trial Recovery" to learn how to perform trial recovery, and "Allowing Recovery to Corrupt Blocks: Phase 4" if you decide to corrupt blocks |
If you decide to allow recovery to proceed in spite of block corruptions, then run the RECOVER
command with the ALLOW
n
CORRUPTION
clause, where n
is the number of allowable corrupt blocks.
To allow recovery to corrupt blocks:
RECOVER
command, allowing a single corruption, repeating as necessary for each corruption to be made. The following statements shows a valid example:
RECOVER DATABASE ALLOW 1 CORRUPTION
This section contains these topics:
When problems such as stuck recovery occur, you have a difficult choice. If the block is relatively unimportant, and if the problem is isolated, then it is better to corrupt the block. But if the problem is not isolated, then it may be better to open the database with the RESETLOGS
option.
Because of this situation, Oracle supports trial recovery. A trial recovery applies redo in a way similar to normal media recovery, but it never writes its changes to disk and it always rolls back its changes. Trial recovery occurs only in memory.
By default, if a trial recovery encounters a stuck recovery or similar problem, then it always marks the data block as corrupt in memory when this action can allow recovery to proceed. Oracle writes errors generated during trial recovery to alert files. Oracle clearly marks these errors as test run errors.
Like normal media recovery, trial recovery can prompt you for archived log filenames and ask you to apply them. Trial recovery ends when:
When trial recovery ends, Oracles removes all effects of the test run from the system--except the possible error messages in the alert files. If the instance fails during trial recovery, then Oracle removes all effects of trial recovery from the system because trial recovery never writes changes to disk.
Trial recovery lets you foresee what problems might occur if you were to continue with normal recovery. For problems caused by ongoing memory corruption, trial recovery and normal recovery can encounter different errors.
You can use the TEST
option for any RECOVER
command. For example, you can start SQL*Plus and then issue any of the following commands:
RECOVER DATABASE TEST RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL TEST RECOVER TABLESPACE users TEST RECOVER DATABASE UNTIL CANCEL TEST
By default, trial recovery always attempts to corrupt blocks in memory if this action allows trial recovery to proceed. In other words, trial recovery by default can corrupt an unlimited number of data blocks. You can specify the ALLOW
n
CORRUPTION
clause on the RECOVER
...
TEST
statement to limit the number of data blocks trial recovery can corrupt in memory.
Note that a trial recovery command is usable in any scenario in which a normal recovery command is usable. Nevertheless, you should only need to run trial recovery when recovery runs into problems.
|
Copyright © 2002 Oracle Corporation. All Rights Reserved. |
|