| Oracle® Database Data
Warehousing Guide 10g Release 1 (10.1) Part Number B10736-01 |
|
|
View PDF |
The following sec tions will help you create and manage a data warehouse:
A dimension is a structure that categorizes da ta in order to enable users to answer business questions. Commonly used dimensions are customers, products, and time. For example, ea ch sales channel of a clothing retailer might gather and store data regarding sales and reclamations of their Cloth assortment. The r etail chain management can build a data warehouse to analyze the sales of its products across all stores over time and help answer qu estions such as:
What is the effect of promoting one product on the sale of a related product that is no t promoted?
What are the sales of a product before and after a promotion?
H ow does a promotion affect the various distribution channels?
The data in the retailer's data warehouse system has t wo important components: dimensions and facts. The dime nsions are products, customers, promotions, channels, and time. One approach for identifying your dimensions is to review your refere nce tables, such as a product table that contains everything about a product, or a promotion table containing all information about p romotions. The facts are sales (units sold) and profits. A data warehouse contains facts about the sales of each product at on a dail y basis.
A typical relational implementation for such a data warehouse is a star schema. The fact information is stored in wha t is called a fact table, whereas the dimensional information is stored in dimension tables. In our example, each sales transaction r ecord is uniquely defined as for each customer, for each product, for each sales channel, for each promotion, and for each day (time) .
In Oracle Database, the dimensional information itself is stored in a dimension
table. In addition, the database object dimension helps to organize and group dimensional information into hierarchies. This represe
nts natural 1:n relationships between columns or column groups (the levels of a hierarchy) that cannot be represented wi
th constraint conditions. Going up a level in the hierarchy is called rolling up the data and going down a lev
el in the hierarchy is called drilling down the data. In the retailer example:
Within the time dimension, months roll up to quarters, quarters roll up to years, and years roll up to all
years.
Within the product dimension, products roll up to subcategories, subcategories roll
up to categories, and categories roll up to all products.
Within the customer dimension, c
ustomers roll up to city. Then cities roll up to state. Then states roll up to country. Then c
ountries roll up to subregion. Finally, subregions roll up to region, as shown in Figur
e 10-1.
Data analysis typically starts at hig her levels in the dimensional hierarchy and gradually drills down if the situation warrants such analysis.
Dimensions do not h ave to be defined. However, if your application uses dimensional modeling, it is worth spending time creating them as it can yield si gnificant benefits, because they help query rewrite perform more complex types of rewrite. Dimensions are also beneficial to certain types of materialized view refresh operations and with the SQLAccess Advisor. They are only mandatory if you use the SQLAccess Adviso r (a GUI tool for materialized view and index management) without a workload to recommend which materialized views and indexes to cre ate, drop, or retain.
|
See Also: Chapter 18, " Query Rewrite" for further details regarding query rewrite and Chapter 17, " SQLAccess Advisor" for further details regarding the SQLAccess Advisor |
In spite of the benefits of dimensions, you must not create dimensions in any s chema that does not fully satisfy the dimensional relationships described in this chapter. Incorrect results can be returned from que ries otherwise.
Before you can create a dimension object, the dimension tables must exist in the database possibly containing the d imension data. For example, if you create a customer dimension, one or more tables must exist that contain the city, state, and count ry information. In a star schema data warehouse, these dimension tables already exist. It is therefore a simple task to identify whic h ones will be used.
Now you can draw the hierarchies of a dimension as shown in Figure 10-1. For exam
ple, city is a child of state (because you can aggregate city-level data up to state), and country. This hierarchical information will be stored in the database object dimension.
In the case of normalized or partially no
rmalized dimension representation (a dimension that is stored in more than one table), identify how these tables are joined. Note whe
ther the joins between the dimension tables can guarantee that each child-side row joins with one and only one parent-side row. In th
e case of denormalized dimensions, determine whether the child-side columns uniquely determine the parent-side (or attribute) columns
. If you use constraints to represent these relationships, they can be enabled with the NOVALIDATE and RELY
clauses if the relationships represented by the constraints are guaranteed by other means.
You create a dimension using eithe
r the CREATE DIMENSION statement or the Dimension Wizard in Oracle E
nterprise Manager. Within the CREATE DIMENSION statement, use the LEVEL clause to identify the
names of the dimension levels.
|
See Als o: Oracle Databa se SQL Reference for a complete description of theCREATE DIMENSION statement |
|
See Also: Chapter 18, " Query Rewrite" for further details of using dimensional information |
|
|
You can use the attribute clause in a CREATE DIMENSION statement to specify one or mu
ltiple columns that are uniquely determined by a hierarchy level.
If you use the extended_attribute_clause to create multiple columns determined by a hierarchy level, you can drop one attribute column without dropping them all. Alternati
vely, you can specify an attribute name for each attribute clause CREATE or ALTER DIMENSION st
atement so that an attribute name is specified for each attribute clause where multiple level-to-column relationships can be individu
ally specified.
The following statement illustrates how you can drop a single column without dropping all columns:
CREATE DIMENSION products_dim
LEVEL product IS (products.prod_id)
LEVEL subcategory IS (products.prod_
subcategory)
LEVEL category IS (products.prod_category)
HIERARCHY prod_rollup (
product CHILD OF
su
bcategory CHILD OF category)
ATTRIBUTE product DETERMINES
(products.prod_name, products.prod_desc,
prod_weight_
class, prod_unit_of_measure,
prod_pack_size,prod_status, prod_list_price, prod_min_price)
ATTRIBUTE subcategory_att DETERMI
NES
(prod_subcategory, prod_subcategory_desc)
ATTRIBUTE category DETERMINES
(prod_category, prod_category_desc);
A
LTER DIMENSION products_dim
DROP ATTRIBUTE subcategory_att LEVEL subcategory COLUMN prod_subcategory;
|
See Also: Oracle Database SQL Reference for a complete description of theCREATE DIMENSION statement |
A s
ingle dimension definition can contain multiple hierarchies. Suppose our retailer wants to track the sales of certain items over time
. The first step is to define the time dimension over which sales will be tracked. Figure 10-2 illustrates a
dimension times_dim with two time hierarchies.
Figure 10-2 times_dim Dimension with Two Time H ierarchies

From the illustration, you can construct the hierarchy of the denormalized time_dim dimension's CREATE DIMENSION statement as follows. The complete CREATE DIMENSION statement as well as the TABLE statement are shown in Oracl
e Database Sample Schemas.
CREATE DIMENSION times_dim
LEVEL day IS times.time_id
LEVEL month IS times.calendar_month_desc
LEVEL quarter IS times.calendar_quarter_desc
LEVEL year IS times.cal
endar_year
LEVEL fis_week IS times.week_ending_day
LEVEL fis_month IS times.fiscal_month_desc
LEVEL fis_quarter IS time
s.fiscal_quarter_desc
LEVEL fis_year IS times.fiscal_year
HIERARCHY cal_rollup (
day CHILD OF
month CHILD OF
quarter CHILD OF
year
)
HIERARCHY fis_rollup (
day CHILD
OF
fis_week CHILD OF
fis_month CHILD OF
fis_quarter CHILD OF
fis_year
) &
lt;attribute determination clauses>;
The tables used to define a dimension may be normalized or denormalized a
nd the individual hierarchies can be normalized or denormalized. If the levels of a hierarchy come from the same table, it is called
a fully denormalized hierarchy. For example, cal_rollup in the times_dim dimension is a denormalized hierar
chy. If levels of a hierarchy come from different tables, such a hierarchy is either a fully or partially normalized hierarchy. This
section shows how to define a normalized hierarchy.
Suppose the tracking of a customer's location is done by city, state, and
country. This data is stored in the tables customers and countries. The customer dimension customers_dim is partially no
rmalized because the data entities cust_id and country_id are taken from different tables. The clause JOIN KEY within the dimension definition specifies how to join together the levels in the hierarchy. The dimensi
on statement is partially shown in the following. The complete CREATE DIMENSION statement as well as the TABLE statement are shown in Orac
le Database Sample Schemas.
CREATE DIMENSION customers_dim
LEVEL customer IS (customers.cu
st_id)
LEVEL city IS (customers.cust_city)
LEVEL state IS (customers.cust_state_province)
LEVEL country IS (count
ries.country_id)
LEVEL subregion IS (countries.country_subregion)
LEVEL region IS (countries.country_region)
HIERARCHY geog_
rollup (
customer CHILD OF
city CHILD OF
state CHILD OF
country CHILD OF
subregion CHILD OF
region
JOIN KEY (customers.country_id) REFERENCES country);
Dimensions can be viewed through one of two methods:
All of the dimensions that exist in the data warehouse can be viewed using Oracle Enterprise Manager. Select the
|
See Also: Oracle Enterprise Manager Administrator's Guide for details regarding creating and using dimensions | <
/tr>
To view the definition of a dimension, use the DESCRIBE_DIMENSION proc
edure in the DBMS_DIMENSION package. For example, if a dimension is created in the sh sample schema with th
e following statements:
CREATE DIMENSION channels_dim
LEVEL channel IS (channels.channel_
id)
LEVEL channel_class IS (channels.channel_class)
HIERARCHY channel_rollup (
channel CHILD OF chann
el_class)
ATTRIBUTE channel DETERMINES (channel_desc)
ATTRIBUTE channel_class DETERMINES (channel_class);
Execute the DESCRIBE_DIMENSION procedure as follows:
SET SERVEROUTPUT ON FORMAT WRAPPED;
--to improve the display of info
EXECUTE DBMS_DIMENSION.DESCRIBE_DIMENSION('SH.CHANNELS_DIM');
You then see the followin g output results:
EXECUTE DBMS_DIMENSION.DESCRIBE_DIMENSION('SH.CHANNELS_DIM');
DIMENSION SH.CHANNEL
S_DIM
LEVEL CHANNEL IS SH.CHANNELS.CHANNEL_ID
LEVEL CHANNEL_CLASS IS SH.CHANNELS.CHANNEL_CLASS
HIERARCHY CHANNEL_ROLL
UP (
CHANNEL CHILD OF
CHANNEL_CLASS)
ATTRIBUTE CHANNEL LEVEL CHANNEL DETERMINES
SH.CHANNELS.CHANNEL
_DESC
ATTRIBUTE CHANNEL_CLASS LEVEL CHANNEL_CLASS DETERMINES
SH.CHANNELS.CHANNEL_CLASS
Constraints play an important role with dimensions. Full referential integrity is som etimes enabled in data warehouses, but not always. This is because operational databases normally have full referential integrity and you can ensure that the data flowing into your data warehouse never violates the already established integrity rules.
It is r
ecommended that constraints be enabled and, if validation time is a concern, then the NOVALIDATE clause should be used a
s follows:
ENABLE NOVALIDATE CONSTRAINT pk_time;
Primary and foreign keys should be impleme
nted also. Referential integrity constraints and NOT NULL constraints on the fact tables provide informatio
n that query rewrite can use to extend the usefulness of materialized views.
In addition, you should use the RELY
clause to inform query rewrite that it can rely upon the constraints being correct as follows:
ALTER T ABLE time MODIFY CONSTRAINT pk_time RELY;
This information is also used for query rewrite. See Chapter 18, " Query Rewrite" for more information.
The information of a dimension obje
ct is declarative only and not enforced by the database. If the relationships described by the dimensions are incorrect, incorrect re
sults could occur. Therefore, you should verify the relationships specified by CREATE DIMENSION using the <
code>DBMS_DIMENSION.VALIDATE_DIMENSION procedure periodically.
This procedure is easy to use and has only four paramete rs:
Dimension owner and name.
Set to TRUE to check only the ne
w rows for tables of this dimension.
Set to TRUE to verify that all columns are not null.
p>
A user-supplied unique identifier to identify the result of each run of the procedure.
T
he following example validates the dimension TIME_FN in the sh schema:
@utldi
m.sql
EXECUTE DBMS_DIMENSION.VALIDATE_DIMENSION ('SH.TIME_FN', FALSE, TRUE,
'my 1st example');
Before running the DIMENSION_EXCEPTIONS, by running the provided
script utldim.sql. If the VALIDATE_DIMENSION procedure encounters any errors, they are placed in this table
. Querying this table will identify the exceptions that were found. The following illustrates a sample:
SELECT * FROM dimension_exceptions WHERE statement_id = 'my 1st example'; STATEMENT_ID OWNER TABLE_NAME DIMENSION_NAME RELA TIONSHIP BAD_ROWID ------------ ----- ---------- -------------- ------------ --------- my 1st example SH MONTH TIME_FN FOREIGN KEY AAAAuwAAJAAAARwAAA
However, rather than query this table, it may be better to query the
rowid of the invalid row to retrieve the actual row that has violated the constraint. In this example, the dimension TIME_FN
code> is checking a table called month. It has found a row that violates the constraints. Using the rowid, you can see e
xactly which row in the month table is causing the problem, as in the following:
SELECT * FROM month
WH
ERE rowid IN (SELECT bad_rowid
FROM dimension_exceptions
WHERE statement_id = 'my 1st example');
MON
TH QUARTER FISCAL_QTR YEAR FULL_MONTH_NAME MONTH_NUMB
------ ------- ---------- ---- --------------- -----
-----
199903 19981 19981 1998 March 3
You can mod
ify the dimension using the ALTER DIMENSION statement. You can add or drop a level, hierarchy, or attribute
from the dimension using this command.
Referring to the time dimension in Figure 10-2, y
ou can remove the attribute fis_year, drop the hierarchy fis_rollup, or remove the level fiscal_year<
/code>. In addition, you can add a new level called f_year as in the following:
ALTER DIME NSION times_dim DROP ATTRIBUTE fis_year; ALTER DIMENSION times_dim DROP HIERARCHY fis_rollup; ALTER DIMENSION times_dim DROP LEVEL fi s_year ALTER DIMENSION times_dim ADD LEVEL f_year IS times.fiscal_year;
If you used the extended_attribute_clau
se when creating the dimension, you can drop one attribute column without dropping all attribute columns. This is illustr
ated in "Dropping and Creating Attributes with Columns", which shows the following statement:
ALTER DIMENSION product_dim DROP ATTRIBUTE size LEVEL prod_type COLUMN Prod_TypeSize;
If you try to remov e anything with further dependencies inside the dimension, Oracle Database rejects the altering of the dimension. A dimension becomes invalid if you change any schema object that the dimension is referencing. For example, if the table on which the dimension is defin ed is altered, the dimension becomes invalid.
To check the status of a dimension, view the contents of the column invali
d in the ALL_DIMENSIONS data dictionary view. To revalidate the dimension, use the COMPILE option as
follows:
ALTER DIMENSION times_dim COMPILE;
Dimensions can also be modified or deleted usi ng Oracle Enterprise Manager. See Oracle Enterprise Manager Admin istrator's Guide for more information.