| Oracle®
Database Data Warehousing Guide 10g Release 1 (10.1) Part Number B10736-01 |
|
|
View PDF |
This chapter describes integrity constraints, and discusses:
Integrity constraints provide a mechanism for ensuring that data conforms to guidelines specified by the datab ase administrator. The most common types of constraints include:
UNIQUE constraints
To ensure that a given column is unique
NOT NULL constraints
To ensure that no null values are allowed
FOREIGN KEY constraints
To ensure that two keys share a primary key to foreign key relationship
Constraints can be used for these purposes in a data wareho use:
Data cleanliness
Constraints verify that the data in the data warehouse conforms to a basic level of data consistency and correctness, preventing the introduction of dirty data.
Query optimization
The Oracle Database utilizes constraints when optimizing SQL queries. Although constraints can be useful in many aspects of q uery optimization, constraints are particularly important for query rewrite of materialized views.
Unlike data in ma ny relational database environments, data in a data warehouse is typically added or modified under controlled circumstances during th e extraction, transformation, and loading (ETL) process. Multiple users normally do not update the data warehouse directly, as they do in an OLTP system.
Many significant constraint features have been introduced for data warehousing. Readers familiar with Oracle's constraint functionality in Oracle dat abase version 7 and Oracle database version 8.x should take special note of the functionality described in this chapter. In fact, man y Oracle database version7-based and Oracle database version8-based data warehouses lacked constraints because of concerns about cons traint performance. Newer constraint functionality addresses these concerns.
To understand how best to use constraints in a data warehouse, you should first understand the basic purposes of constraints. Some of these purposes are:
Enforcement
In order to use a constraint for enforcement, the constraint must be in the ENABLE state. An en
abled constraint ensures that all data modifications upon a given table (or tables) satisfy the conditions of the constraints. Data m
odification operations which produce data that violates the constraint fail with a constraint violation error.
Validation
To use a constraint for validation, the constraint must be in the VALIDATE state. If the constr
aint is validated, then all data that currently resides in the table satisfies the constraint.
Note that validation is indepen dent of enforcement. Although the typical constraint in an operational system is both enabled and validated, any constraint could be validated but not enabled or vice versa (enabled but not validated). These latter two cases are useful for data warehouses.
Belief
In some cases, you will know that the conditions for a given constraint are true, so you do not nee
d to validate or enforce the constraint. However, you may wish for the constraint to be present anyway to improve query optimization
and performance. When you use a constraint in this way, it is called a belief or RELY constraint, and the constraint mus
t be in the RELY state. The RELY state provides you with a mechanism for telling Oracle that a given constr
aint is believed to be true.
Note that the RELY state only affects constraints that have not been validated.
This s ection assumes that you are familiar with the typical use of constraints. That is, constraints that are both enabled and validated. F or data warehousing, many users have discovered that such constraints may be prohibitively costly to build and maintain. The topics d iscussed are:
A UNIQUE constraint is typically enforced using a UNIQUE index. However,
in a data warehouse whose tables can be extremely large, creating a unique index can be costly both in processing time and in disk s
pace.
Suppose that a data warehouse contains a table sales, which includes a column sales_id.
sales_id uniquely identifies a single sales transaction, and the data warehouse administrator must ensure that this column is
unique within the data warehouse.
One way to create the constraint is as follows:
ALTER TABLE sa les ADD CONSTRAINT sales_uk UNIQUE (prod_id, cust_id, promo_id, channel_id, time_id);
By default, this constraint is both
enabled and validated. Oracle implicitly creates a unique index on sales_id to support this constraint. However, this i
ndex can be problematic in a data warehouse for three reasons:
The unique index can be very large, becau
se the sales table can easily have millions or even billions of rows.
The unique index is r arely used for query execution. Most data warehousing queries do not have predicates on unique keys, so creating this index will prob ably not improve performance.
If sales is partitioned along a column other than sales
_id, the unique index must be global. This can detrimentally affect all maintenance operations on the sales table
.
A unique index is required for unique constraints to ensure that each individual row modified in the sales
code> table satisfies the UNIQUE constraint.
For data warehousing tables, an alternative mechanism for unique con straints is illustrated in the following statement:
ALTER TABLE sales ADD CONSTRAINT sales_uk UNIQUE (p rod_id, cust_id, promo_id, channel_id, time_id) DISABLE VALIDATE;
This statement creates a unique constraint, but, becaus e the constraint is disabled, a unique index is not required. This approach can be advantageous for many data warehousing environment s because the constraint now ensures uniqueness without the cost of a unique index.
However, there are trade-offs for the data
warehouse administrator to consider with DISABLE VALIDATE constraints. Because this constraint is disabled
, no DML statements that modify the unique column are permitted against the sales table. You can use one of two strategi
es for modifying this table in the presence of a constraint:
Use DDL to add data to this table (such as exchanging partitions). See the example in Chapter 15, " Maintaining the Data Warehouse".
Before modifying this table, drop the constraint. Then, make all necessary data modifications. Finally, re-creat
e the disabled constraint. Re-creating the constraint is more efficient than re-creating an enabled constraint. However, this approac
h does not guarantee that data added to the sales table while the constraint has been dropped is unique.
In a star schema data warehouse, FOREIGN KEY co
nstraints validate the relationship between the fact table and the dimension tables. A sample constraint might be:
ALTER TABLE sales ADD CONSTRAINT sales_time_fk FOREIGN KEY (time_id) REFERENCES times (time_id) ENABLE VALIDATE;
However, in some situations, you may choose to use a different state for the FOREIGN KEY constraints, i
n particular, the ENABLE NOVALIDATE state. A data warehouse administrator might use an ENABLE NOVALIDATE co
nstraint when either:
The tables contain data that currently disobeys the constraint, but the data wareh ouse administrator wishes to create a constraint for future enforcement.
An enforced constraint is requi red immediately.
Suppose that the data warehouse loaded new data into the fact tables every day, but refreshed the d
imension tables only on the weekend. During the week, the dimension tables and fact tables may in fact disobey the FOREIGN KEY constraints. Nevertheless, the data warehouse administrator might wish to maintain the enforcement of this constr
aint to prevent any changes that might affect the FOREIGN KEY constraint outside of the ETL process. Thus,
you can create the FOREIGN KEY constraints every night, after performing the ETL process, as shown here:
ALTER TABLE sales ADD CONSTRAINT sales_time_fk FOREIGN KEY (time_id) REFERENCES times (time_id) ENABLE NO VALIDATE;
ENABLE NOVALIDATE can quickly create an enforced constraint, even when the constraint
is believed to be true. Suppose that the ETL process verifies that a FOREIGN KEY constraint is true. Rathe
r than have the database re-verify this FOREIGN KEY constraint, which would require time and database resou
rces, the data warehouse administrator could instead create a FOREIGN KEY constraint using ENABLE NOVALIDATE.
The ETL process commonly verifies that certain constraints are tru
e. For example, it can validate all of the foreign keys in the data coming into the fact table. This means that you can trust it to p
rovide clean data, instead of implementing constraints in the data warehouse. You create a RELY constraint as follows:
p>
ALTER TABLE sales ADD CONSTRAINT sales_time_fk FOREIGN KEY (time_id) REFERENCES times (time_id) RELY DI SABLE NOVALIDATE;
This statement assumes that the primary key is in the RELY state. RELY constr
aints, even though they are not used for data validation, can:
Enable more sophisticated query rewrites for materialized views. See Chapter 18, " Query Rewrite" for further details.
Enable other data warehousing tools to retrieve information regarding constraints directly from the Oracle data dictionary.
li>Creating a RELY constraint is inexpensive and does not impose any overhead during DML or load. Because the c
onstraint is not being validated, no data processing is necessary to create it.
All constraints can be validated in parallel. When validating constraint s on very large tables, parallelism is often necessary to meet performance goals. The degree of parallelism for a given constraint op eration is determined by the default degree of parallelism of the underlying table.
You can create and maintain
constraints before you partition the data. Later chapters discuss the significance of partitioning for data warehousing. Partitioning
can improve constraint management just as it does to management of many other operations. For example, Chapter 15, " Maintaining the Data Warehouse" provides a scenario creating UNIQUE and FOREIGN K
EY constraints on a separate staging table, and these constraints are maintained during the
EXCHANGE PARTITION statement.
You can create constraints on view
s. The only type of constraint supported on a view is a RELY constraint.
This type of constraint is useful when q ueries typically access views instead of base tables, and the database administrator thus needs to define the data relationships betw een views rather than tables. View constraints are particularly useful in OLAP environments, where they may enable more sophisticated rewrites for materialized views.