| Oracle® Database Backup and Recovery Advanced User'
s Guide 10g Release 1 (10.1) Part Number B10734-01 |
|
tr>
![]() Previous ![]() < font size="-2">Next |
This chapter describes how to troubl eshoot user-managed media recovery, and includes the following topics:
Table 20-1 describes potential problems that can occur during media recovery.
< a name="1006187">| |
Description | |
|---|---|---|
|
Missing or misnamed archived log |
Recovery stops because the database cannot find the archived log recorded in the control file. | |
|
When you attempt to open the
database, error |
a>
This error commonly occurs because:
| |
|
Redo record problems |
Two possible cases are as follows:
| |
|
Corrupted archi ved logs |
Logs may be corrupted while they are stored on or copied b
etween storage systems. If | |
|
Archived logs with incompatible parallel redo format |
If you enable the parallel redo feature, then the database generates redo logs in a new forma
t. Prior releases of Oracle are unable to apply parallel redo logs. However, releases prior to Oracle9i Relea
se 2 (9.2) can detect the parallel redo format and indicate the inconsistency with the following error message: See Also: Oracle Database Performance Tuning Guide to learn about the parallel redo feature | |
|
Corrupted data blocks |
A datafile backup may have contained a corrupted data block, or the data block may become corrupted either during recovery or when it was copied to the backup. If checksums are being used, then the database signals a checksum error. Otherwise, the problem may also appear as a redo corruption. | |
|
Random problems |
Memory corruptions and other transient problems can occur during recovery. |
| See Also:
"Performing Block Media Recovery with RMAN" to learn about block media recovery |
If media recovery encounters a problem, then obtain as much information as possible after recovery halts. You do not want to waste time fixing the wrong problem, which may in fact make mat ters worse.
The goal of this initial investigation is to determine whether the problem is c aused by incorrect setup, corrupted redo logs, corrupted data blocks, memory corruption, or other problems. If you see a checksum err or on a data block, then the data block is corrupted. If you see a checksum error on a redo log block, then the redo log is corrupted .
Sometimes the cause of a recovery problem can be difficult to determine. Nevertheless, th e methods in this chapter allow you to quickly recover a database even when you do not completely understand the cause of the problem .
To investigate media recovery problems:
alert.log to see whether the error messa
ges give general information about the nature of the problem. For example, does the alert_SID.log indicate any checksum failures? Does the alert_SID.log indicate that med
ia recovery may have to corrupt data blocks in order to continue?D epending on the type of media recovery problem you suspect, you have different solutions at your disposal. You can try one or a combi nation of the methods described in Table 20-2. Note that these methods are fairly safe: in al most all cases, they should not cause any damage to the database.
| If you suspect . . . | Then . . . |
|---|---|
|
Missing/misnamed archived logs |
Determine whether you entered the correct filename. If you did, then check to se e whether the log is missing from the operating system. If it is missing, and you have a backup, then restore the backup and apply th e log. If you do not have a backup, then if possible perform incomplete recovery up to the point of the missing log. |
|
R eview the causes of this error in Table 20-1. Make sure that all read/write datafiles requiri ng recovery are online. If you use a backup control file for recovery, then the control file and datafiles must be at a consistent SC N for the database to be opened. If you do not have the necessary redo, then you must re-create the control file. |
|
Corrupt archived logs |
The log is corrupted if the checksum verification on the log redo block fails. I
f The |
|
Archived logs with incompatible parallel redo form at |
If you are running an Oracle release prior to Oracle9i Release 2, and if you are attempting to apply redo logs created with the parallel redo format, then you must do the fo llowing steps:
See Also: Oracle Database Performance Tuning Guid e to learn about the parallel redo feature |
|
Memory corruption or transient problems |
You may be able to fix the problem by shutting down the database and restarting recovery. The databse should be left in a co nsistent state if the second attempt also fails. |
|
Corrupt data blocks |
Restore and re
cover the datafile again with user-managed methods, or restore and recover individual data blocks with the RMAN A data block is corrupted if the checksum ver
ification on the block fails. If |
If you cannot fix the problem with the m ethods described in Table 20-2, then there may be no easy way to fix the problem without losi ng data. You have these options:
alert_.log indicates that recovery can continue if it is allowed to corrupt a data block, which should be the
case for most recovery problems. This option is best if it is important to bring up the database quickly and recover all changes. If
you are contemplating this option as a last resort, then proceed to "Deciding Whether to Allow Recove
ry to Corrupt Blocks: Phase 3".
| See Also:
"Performing Block Media Recovery with RMAN" to learn how to perform block media recovery with
the |
When media recovery encounters a problem, the a
lert_SID.log may indicate that recovery can continue if it is allowed to corrupt the data b
lock causing the problem. The alert_SID.log always contains information about the blo
ck: its block type, block address, the tablespace it belongs to, and so forth. For blocks containing user data, the alert log may als
o report the data object number.
In this case, the database can proceed with recovery if it
is allowed to mark the problem block as corrupt. Nevertheless, this response is not always advisable. For example, if the block is a
n important block in the SYSTEM tablespace, marking the block as corrupt can eventually prevent you from opening the rec
overed database. Another consideration is whether the recovery problem is isolated. If this problem is followed immediately by many o
ther problems in the redo stream, then you may want to open the database with the RESETLOGS option.
For a block containing user data, you can usually query the database to find out which object or table owns thi s block. If the database is not open, then you should be able to open the database read-only, even if you are recovering a whole data base backup. The following example cancels recovery and opens read-only:
CANCEL ALTER DATABASE OPEN READ ONLY;
Assume that the data ob
ject number reported in the alert_SID.log is 8031. You can determine the
owner, object name, and object type by issuing this query:
SELECT OWNER, OBJECT_NAME, SUBO BJECT_NAME, OBJECT_TYPE FROM DBA_OBJECTS WHERE DATA_OBJECT_ID = 8031; < /a>
To determine whether a recovery problem is isolated, you can run a diagnostic alert_SID.log. You can use the RECOVER ... TEST statement to invoke trial recover
y.
After you have done these investigations, you can follow the guidelines in Table 20-3 to decide whether to allow recovery to corrupt blocks.
< /a>| If the problem is . . . | and the block is . . . | a> Then . . . |
|---|---|---|
|
not isolated |
n/a |
You should probably open the database wit
h the |
|
isolated |
in the |
Do not corrupt the block, because it may eventually prevent you fro
m opening the database. However, sometimes data in the |
|
isolated |
index da ta |
Consider corrupting index blocks because the index can be rebuil t later after the database has been recovered. |
|
isolated |
user data |
Decide based on the importance of the data. If you continue with datafile recovery a
nd corrupt a block, you lose data in the block. However, you can use RMAN to perform block media recovery later after datafile recove
ry completes. If you open |
|
isolated |
rollback or undo data |
Consider corrupting the rollback or undo block because it does not harm the database if the transac tions that generated the undo are never rolled back. However, if those transactions are rolled back, then corrupting the undo block c an cause problems. If you are unsure, then call Oracle Support. |
| See Also:
"Performing Trial Recovery" to learn how to perform trial recovery, and "Allowing Recovery to Corrupt Blocks: Phase 4" if you decide to corru pt blocks |
If you decide to allow recovery to proceed in spite of block corruptions, then run the RECOVER
command with the ALLOW n CORRUPTION clause, where n is the numb
er of allowable corrupt blocks.
To allow recovery to corrupt blocks: strong>
RECOVER command, allowing a single corruption, repeating as necessary
for each corruption to be made. The following statements shows a valid example:
RECOVER DATAB ASE ALLOW 1 CORRUPTION
When problems such as stuck recovery occur, you have a difficult choice. If the block is relatively unimportant, and if the p
roblem is isolated, then it is better to corrupt the block. But if the problem is not isolated, then it may be better to open the dat
abase with the RESETLOGS option.
Because of this situation, the Oracle databas e supports trial recovery. A trial recovery applies redo in a way similar to normal media recovery, but it never writes its changes t o disk and it always rolls back its changes. Trial recovery occurs only in memory.
By default, if a trial recovery encounters a stuck recovery or sim ilar problem, then it always marks the data block as corrupt in memory when this action can allow recovery to proceed. The database w rites errors generated during trial recovery to alert files. These errors are clearly marked as test run errors.
Like normal media recovery, trial recovery can prompt you for archived log filenames and ask you to apply them. Trial recovery ends when:
When trial recovery ends, the database removes all effects of the test run f rom the system--except the possible error messages in the alert files. If the instance fails during trial recovery, then the database removes all effects of trial recovery from the system because trial recovery never writes changes to disk.
Trial recovery lets you foresee what problems might occur if you were to continue with normal recovery. For problems caused by ongoing memory corruption, trial recovery and normal recovery can encounter different errors.
< !--TOC=h2-"1006462"-->You can use the TEST option for any RECOVER
code> command. For example, you can start SQL*Plus and then issue any of the following commands:
RECOVER DATABASE TEST RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL TEST R ECOVER TABLESPACE users TEST RECOVER DATABASE UNTIL CANCEL TEST
By default, trial recovery always attempts to corrupt blocks in memory if this action allows trial recovery to p
roceed. In other words, trial recovery by default can corrupt an unlimited number of data blocks. You can specify the ALLOW n CORRUPTION clause on the RECOVER ... TEST statement t
o limit the number of data blocks trial recovery can corrupt in memory.
A trial recovery co mmand is usable in any scenario in which a normal recovery command is usable. Nevertheless, you should only need to run trial recover y when recovery runs into problems.