• Skip Headers

    Oracle® Database Conce pts
    10g Release 1 (10.1)

    Part Number B10743-01
    Go to Documentation Home
    Home
    Go to Book List
    Book List
    Go to Table of Contents
    Contents
    Go to Index
    Index
    Go to Master Index
    Master Index
    Go to Feedback page
    Feedback
    < /td>

    Go to previous page
    Previous
    G
o to next page
    Next
    View PDF

    17 High Availability

    Computing environments confi gured to provide nearly full-time availability are known as high availability systems. Oracle has a number of products and features t hat provide high availability in cases of unplanned downtime or planned downtime.

    This chapter includes the following topics:< /p>

    Introduction to High Availability

    Computing environments c onfigured to provide nearly full-time availability are known as high availability systems. Such systems typically have redundant hard ware and software that makes the system available despite failures. Well-designed high availability systems avoid having single point s-of-failure.

    Oracle has a number of products and features that provide high availability i n cases of unplanned downtime or planned downtime.

    Overview of Unplanned D owntime

    Various things can cause unplanned downtime. Oracle offers the following features to maintain high availabilit y during unplanned downtime:

    Oracle Solutions to System Failures

    This section covers some Oracle solutions to system failures, including the fo llowing:

    Overview of Fast-Start Fault Recovery

    Oracle Enterprise Edition features include a fast-start fault recovery functionality to control instance recovery. This reduces the time required for cache recovery and makes the recovery bounded and predictable by limiting the number of dirty buffers and the number of redo records generated between the most recent redo record and the last checkpoint.

    The foundation of fast-s tart recovery is the fast-start checkpointing architecture. Instead of the conventional event driven (that is, log switching) checkpo inting, which does bulk writes, fast-start checkpointing occurs incrementally. Each DBWn process periodically writes buffers to disk to advance the checkpoint position. The oldest modified blocks are written first to ensure that every write lets the checkpo int advance. Fast-start checkpointing eliminates bulk writes and the resultant I/O spikes that occur with conventional checkpointing.

    With fast-start fault recovery, the Oracle database is opened for access by applications without having to wait for the undo, or rollback, phase to be completed. The rollback of data locked by uncommitted transaction is done dynamically on an as needed basis . If the user process encounters a row locked by a crashed transaction, then it just rolls back that row. The impact of rolling back the rows requested by a query is negligible.

    Fast-start fault recovery is very fast, because undo data is stored in the databa se, not in the log files. Undoing a block does not require an expensive sequential scan of a log file. It is simply a matter of locat ing the right version of the data block within the database.

    Fast-start recovery can greatly reduce mean time to recover (MTTR) with minimal effects on online application performance. Oracle continuously est imates the recovery time and automatically adjusts the checkpointing rate to meet the target recovery time.


    See Also:

    Oracle Database Performance Tuning Guide for information on fast-start fault recovery

    < /a>

    Over view of Real Application Clusters

    Real Application Clusters (RAC) databases are inherently high availability systems. The clusters that are typical of RAC environments can provide continuous service for both planned and unplanned outages. RAC builds h igher levels of availability on top of the standard Oracle features. All single instance high availability features, such as fast-sta rt recovery and online reorganizations, apply to RAC as well.

    In addition to all the regular Oracle features, RAC exploits the redundancy provided by clustering to deliver availability with n-1 node failures in an n-node cluster. In other words, all users hav e access to all data as long as there is one available node in the cluster.

    Oracle Solutions to Data Failures

    This section covers some Oracle solutions to data fa ilures, including the following:

    Overview of Backup and Recovery Features for High Availability

    In addition to fast-start fault recovery and mean time to recovery, Oracle provides several sol utions to protect against and recover from data and media failures. A system or network fault may prevent users from accessing data, but media failures without proper backups can lead to lost data that cannot be recovered. These include the following:

    • Recovery Manager (RMAN) is Oracle's utility to manage the backup and recovery of the database. It determines the most efficient method of running the requested backup, restore, or recovery operation. RMAN and the server automatically identify modifica tions to the structure of the database and dynamically adjust the required operation to adapt to the changes. You have the option to specify the maximum disk space when restoring logs during media recovery, thus enabling an efficient space management during the reco very process.

    • Oracle Flashback Database lets you quickly recover an Oracle database to a previous time t o correct problems caused by logical data corruptions or user errors.

    • Oracle Flashback Query lets you vi ew data at a point-in-time in the past. This can be used to view and reconstruct lost data that was deleted or changed by accident. D evelopers can use this feature to build self-service error correction into their applications, empowering end-users to undo and corre ct their errors.

    • Backup information can be stored in an independent flash recovery area. This increases the resilience of the information, and allows easy querying of backup information. It also acts as a central repository for backup in formation for all databases across the enterprise, providing a single point of management.

    • When performi ng a point in time recovery, you can query the database without terminating recovery. This helps determine whether errors affect crit ical data or non-critical structures, such as indexes. Oracle also provides trial recovery in which recovery continues but can be bac ked out if an error occurs. It can also be used to "undo" recovery if point in time recovery has gone on for too long.

    • With Oracle's block-level media recovery, if only a single block is damaged, then only that block needs to be recovere d. The rest of the file, and thus the table containing the block, remains online and accessible.

    • LogMine r lets a DBA find and correct unwanted changes. Its simple SQL interface allows searching by user, table, time, type of update, value in update, or any combination of these. LogMiner provides SQL statements needed to undo the erroneous operation. The GUI interface s hows the change history. Damaged log files can be searched with the LogMiner utility, thus recovering some of the transactions record ed in the log files.


      See Also:


    Overview of Partitioning

    Partitioning addresses key issues in supp orting very large tables and indexes by letting you decompose them into smaller and more manageable pieces called partitions< /strong>. SQL queries and DML statements do not need to be modified in order to access partitioned tables. However, after partitions are defined, DDL statements can access and manipulate individuals partitions rather than entire tables or indexes. This is how partit ioning can simplify the manageability of large database objects. Also, partitioning is entirely transparent to applications.

    Overview of Transparent Application Failover

    Transparen t Application Failover enables an application user to automatically reconnect to a database if the connection fails. Active transacti ons roll back, but the new database connection, made by way of a different node, is identical to the original. This is true regardles s of how the connection fails.

    With Transparent Application Failover, a client notices no loss of connection as long as there is one instance left serving the application. The database administrator controls which applications run on which instances and also creates a failover order for each application. This works best with Real Application Clusters: If one node dies, then you can quickly reconnect to another node in the cluster.

    Elements Affected by Transparent Application Failover

    During normal client/serve r database operations, the client maintains a connection to the database so the client and server can communicate. If the server fail s, so then does the connection. The next time the client tries to use the connection the client issues an error. At this point, the u ser must log in to the database again.

    With Transparent Application Failover, however, Oracle automatically obtains a new conn ection to the database. This enables users to continue working as if the original connection had never failed.

    There are sever al elements associated with active database connections. These include:

    • Client/server database connectio ns

    • Users' database sessions executing commands

    • Open cursors used for fetchi ng

    • Active transactions

    • Server-side program variables

    Trans parent Application Failover can be used to restore client/server database connections, users' database sessions and optionally an act ive query. To restore other elements of an active database connection, such as active transactions and server-side package state, the application code must be capable of re-running statements that occurred after the last commit.

    Oracle Solutions to Disasters

    Oracle's primary solution to disasters is the Oracle Data Guard produc t.

    Overview of Oracle Data Guard

    Oracle Data Guard maintains up to nine standby databases, each of which is a real-time copy of the production database, to protect again st all threats—corruptions, data failures, human errors, and disasters. If a failure occurs on the production (primary) databa se, then you can fail over to one of the standby databases to become the new primary database. In addition, planned downtime for main tenance can be reduced, because you can quickly and easily move (switch over) production processing from the current primary database to a standby database, and then back again.

    Oracle Data Guard Configurations

    An Oracle Data Guard configuration is a collection of loosely connected systems, consisting of a single primary database and up to nine standby databases that can include a mix of both physical and logical standby databases. The databases in a Data Guard configuration can be connected by a LAN in the same data center, or—for maximum disaster protection& #x2014;geographically dispersed over a WAN and connected by Oracle Net Services.

    A Data Guard configuration can be deployed fo r any database. This is possible because its use is transparent to applications; no application code changes are required to accommod ate a standby database. Moreover, Data Guard lets you tune the configuration to balance data protection levels and application perfor mance impact; you can configure the protection mode to maximize data protection, maximize availability , or maximize performance.

    As application transactions make changes to the primary da tabase, the changes are logged locally in redo logs. For physical standby databases, the changes are applied to each physical standby database that is running in managed recovery mode. For logical standby databases, the changes are applied using SQL regenerated from the archived redo logs.

    Ph ysical Standby Databases

    A physical standby database is physically identical to the primary database. While the primar y database is open and active, a physical standby database is either performing recovery (by applying logs), or open for reporting ac cess. A physical standby database can be queried read only when not performing recovery while the production database continues to sh ip redo data to the physical standby site.

    Physical standby on disk database structures must be identical to the primary datab ase on a block-for-block basis, because a recovery operation applies changes block-for-block using the physical rowid. The database s chema, including indexes, must be the same, and the database cannot be opened (other than for read-only access). If opened, the physi cal standby database will have different rowids, making continued recovery impossible.

    Logical Standby Databases

    A logical standby database takes standard Oracle archived redo logs, transforms the redo records they contain into SQL transactions, and then applies them to an open standby database. Although changes can be applied concurrently with end-user access, the tables being maintained through regenerated SQL transactions allow read-only access to users of the logical standby database. Because the database is open, it is physically dif ferent from the primary database. The database tables can have different indexes and physical characteristics from their primary data base peers, but must maintain logical consistency from an application access perspective, to fulfill their role as a standby data sou rce.

    Oracle Data Guard Brok er

    Oracle Data Guard Broker automates complex creation and maintenance tasks and provides dramatically enhanced monito ring, alert, and control mechanisms. It uses background agent processes that are integrated with the Oracle database server and assoc iated with each Data Guard site to provide a unified monitoring and management infrastructure for an entire Data Guard configuration. Two user interfaces are provided to interact with the Data Guard configuration, a command-line interface (DGMGRL) and a graphical user interface called Data Guard Manager.

    Oracle Data Guard Manager, which is integrated with Oracle Enterprise Man ager, provides wizards to help you easily create, manage, and monitor the configuration. This integration lets you take advantage of other Enterprise Manager features, such as to provide an event service for alerts, the discovery service for easier setup, and the jo b service to ease maintenance.

    Oracle Solutions to Human Errors

    This secti on covers some Oracle solutions to human errors, including the following:

    Overview of Oracle Fl ashback Features

    If a major error occurs, such as a batch job being run twice in succession, the database administrato r can request a Flashback operation that quickly recovers the entire database to a previous point in time, eliminating the need to re store backups and do a point-in-time recovery. In addition to Flashback operations at the database level, it is also possible to flas h back an entire table. Similarly, the database can recover tables that have been inadvertently dropped by a user.

    • Oracle Flashback Database lets you quickly bring your database to a prior point in time by undoing all the changes that ha ve taken place since that time. This operation is fast, because you do not need to restore the backups. This in turn results in much less downtime following data corruption or human error.

    • Oracle Flashback Table lets you quickly recover a table to a point in time in the past without restoring a backup.

    • Oracle Flashback Drop provides a way to restore accidentally dropped tables.

    • Oracle Flashback Query lets you view data at a point-in-time in the past. This can be used to view and reconstruct lost data that was deleted or changed by accident. Developers can use this feature to build self-service error correction into their applications, empowering end-users to undo and correct their errors.

    • Oracle Flashback Version Query uses undo data stored in the database to view the changes to one or more rows along wi th all the metadata of the changes.

    • Oracle Flashback Transaction Query lets you examine changes to the d atabase at the transaction level. As a result, you can diagnose problems, perform analysis, and audit transactions.


      See Also:


    Overview of LogMiner

    Oracle LogMiner lets you query redo log files throug h a SQL interface. Redo log files contain information about the history of activity on a database. Oracle Enterprise Manager includes the Oracle LogMiner Viewer graphical user interface (GUI).

    All changes made to user data or to the database dictionary are re corded in the Oracle redo log files. Therefore, redo log files contain all the necessary information to perform recovery operations. Because redo log file data is often kept in archived files, the data is already available. To take full advantage of all the features LogMiner offers, you should enable supplemental logging.



    Overview of Security Features for High Availability

    Oracle Internet Directory (OID) lets you manage the security attributes and privileges for users, including users authenticated by X.509 certificates. OID also enforces attribute-level access control. This enables read, write, or update privileges on specific attributes to be restricted to specific named users, such as an enterprise security administ rator. Directory queries and responses can use SSL encryption for enhanced protection during authentication and other interactions. O ther database security features including Virtual Private Database (VPD), Label Security, audit, and proxy authentication can be leve raged for these directory-based users when configured as enterprise users.

    The Oracle Advanced Security User Migration Utility assists in migrating existing database users to OID. After a user is created in the directory, organizations can continue to build n ew applications in a Web environment and leverage the same user identity in OID for provisioning the user access to these application s.


    See Also:

    Chapter 11, " Oracle Utilities"

    See Also:

    Chapter 20, " Database Security "

    < /div>

    Overview of Planned Downtime

    Oracle provides a number of capabilties to reduce or eliminate planned downtime. These include the fo llowing:

    System Maintenance

    Oracle provides a high degree of self-management - automating routine DBA tasks and reducing complexity of space, memory, and resource administration. These include the following:

    • Automatic undo management 13;database administrators do not need to plan or tune the number and sizes of rollback segments or consider how to strategically ass ign transactions to a particular rollback segment.

    • Dynamic memory management to resize the Oracle shared memory components dynamically. Oracle also provides advisories to help administrators size the memory allocation for optimal databas e performance.

    • Oracle-managed files to automatically create and delete files as needed

    • Free space management within a table with bitmaps. Additionally, Oracle provides automatic extension of data files, so th e files can grow automatically based on the amount of data in the files.

    • Data Guard for hardware and ope rating system maintenance

    Data Maintenance

    Database administrators can perform a variety of on line operations to table definitions, including online reorganization of heap-organized tables. This makes it possible to reorganize a table while users have full access to it.

    This online architecture provides the following capabilities:

    • Any physical attribute of the table can be changed online. The table can be moved to a new location. The table can be partit ioned. The table can be converted from one type of organization (such as a heap-organized) to another (such as index-organized).

      < /li>
    • Many logical attributes can also be changed. Column names, types, and sizes can be changed. Columns can be a dded, deleted, or merged. One restriction is that the primary key of the table cannot be modified.

    • Onlin e creation and rebuilding of secondary indexes on index-organized tables (IOTs). Secondary indexes support efficient use of block hin ts (physical guesses). Invalid physical guesses can be repaired online.

    • Indexes can be created online an d analyzed at the same time. Online fix-up of physical guess component of logical rowids (used in secondary indexes on index-organize d tables) also can be used.

    • Fix the physical guess component of logical rowids stored in secondary index es on IOTs. This allows online repair of invalid physical guesses

    Database Maintenance

    Oracle provides technology to do maintenance of database software with little or no database do wntime. Patches can be applied to Real Application Clusters instances one at a time, such that database service is always available.< /p>

    A Real Application Clusters system can run in this mixed mode for an arbitrary period to test the patch in the production envi ronment. When satisfied that the patch is successful, this procedure is repeated for the remaining nodes in the cluster. When all nod es in the cluster have been patched, the rolling patch upgrade is complete, and all nodes are running the same version of Oracle.