Skip Headers

Oracle® Database Data Warehousing Guide
10g Release 1 (10.1)

Part Number B10736-01
Go to Documentation Home
Home
Go to Book
 List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page
Previous
Go to next page
Next
View PDF

19 Schema Modeling Techniques< /font>

The following topics provide information about schemas in a data warehouse:

Schemas in Data Warehouses

A schema is a collection of database objects, including tables, views, indexes, and synonyms.

There is a variety of ways of arranging schema objects in the schema models designed for data warehousing. One data warehouse schema model is a star schema. The sh sample schema (the basis for most of the examples in this book) uses a star schema. Howev er, there are other schema models that are commonly used for data warehouses. The most prevalent of these schema models is the third normal form (3NF) schema. Additionally, some data warehouse schemas are neither star schemas nor 3NF schemas, but instead share characteristics of both schemas; these are re ferred to as hybrid schema models.

The Oracle Database is designed to support all data warehouse schemas. Some features may be specific to one schema model (such as the star transformation feature, described in "Using Star Transformation"< /a>, which is specific to star schemas). However, the vast majority of Oracle's data warehousing features are equally applicable to s tar schemas, 3NF schemas, and hybrid schemas. Key data warehousing capabilities such as partitioning (including the rolling window lo ad technique), parallelism, materialized views, and analytic SQL are implemented in all schema models.

The determination of wh ich schema model should be used for a data warehouse should be based upon the requirements and preferences of the data warehouse proj ect team. Comparing the merits of the alternative schema models is outside of the scope of this book; instead, this chapter will brie fly introduce each schema model and suggest how Oracle can be optimized for those environments.

Third Normal Form

Although this guide primarily uses star schemas in its examples, you can also use the third normal form for your data warehou se implementation.

Third normal form modeling is a classical relational-database modeling technique that minimizes data redund ancy through normalization. When compared to a star schema, a 3NF schema typically has a larger number of tables due to this normaliz ation process. For example, in Figure 19-1, orders and order items tab les contain similar information as sales table in the star schema in Figure 19-2.

3NF sch emas are typically chosen for large data warehouses, especially environments with significant data-loading requirements that are used to feed data marts and execute long-running queries.

The main advantages of 3NF schemas are that they:

  • Provide a neutral schema design, independent of any application or data-usage considerations

  • May require less data-transformation than more normalized schemas such as star schemas

Figure 19-1 presents a graphical representation of a third normal form schema.

Figure 19-1 Third Normal Form Schem a

Description of dwhsg108.gif follows
Description of the illustration dwhsg108.gif

Optimizing Third Normal Form Queries

Queries on 3NF schemas are often very complex and i nvolve a large number of tables. The performance of joins between large tables is thus a primary consideration when using 3NF schemas .

One particularly important feature for 3NF schemas is partition-wise joins. The largest tables in a 3NF schema should be par titioned to enable partition-wise joins. The most common partitioning technique in these environments is composite range-hash partiti oning for the largest tables, with the most-common join key chosen as the hash-partitioning key.

Parallelism is often heavily utilized in 3NF environments, and parallelism should typically be enabled in these environments.

< font face="arial, helvetica, sans-serif" color="#330099">Star Schemas

The star schema is perhaps the simplest data warehouse schema. It is called a star schem a because the entity-relationship diagram of this schema resembles a star, with points radiating from a central table. The center of the star consists of a large fact table and the points of the star are the dimension tables.

A star query is a join between a fact table and a number of di mension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables ar e not joined to each other. The optimizer recognizes star queries and generates efficient execution plans for them.

A typical fact table contains keys and measures. For example, in the sh sample schema, the fact table, sales, contain the measures quantity_sold, amo unt, and cost, and the keys cust_id, time_id, prod_id, channel_id, and promo_id. The dimension tables are customers, times, products, ch annels, and promotions. The products dimension table, for example, contains information about each p roduct number that appears in the fact table.

A star join is a primary key to foreign key join of the dimension tables to a fact table.

The main advantages of star schemas are that they:

  • Provi de a direct and intuitive mapping between the business entities being analyzed by end users and the schema design.

  • Provide highly optimized performance for typical star queries.

  • Are widely supported by a larg e number of business intelligence tools, which may anticipate or even require that the data warehouse schema contain dimension tables .

Star schemas are used for both simple data marts and very large data warehouses.

Figure 19-2 presents a graphical representation of a star schema.

Snowflake Schemas

The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake.

Snowflake schema s normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one larg e table. For example, a product dimension table in a star schema might be normalized into a products table, a product_category< /code> table, and a product_manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. Fi gure 19-3 presents a graphical representation of a snowflake schema.

Figure 19-3 Snowflake Schema

Description of dwhsg008.gif follows
Description of the illustration dwhsg008.gif


Note:

Oracle Corpor ation recommends you choose a star schema over a snowflake schema unless you have a clear reason not to.

Optimizing Star Queries

You should consider the following when using star queries:

< a id="i1006327" name="i1006327">

Tuning Star Queries

To get the best possible performance for star queries, it is importan t to follow some basic guidelines:

  • A bitmap index should be built on each of the foreign key columns of the fact table or tables.

  • The initialization parameter STAR_TRANSFORMATION_ENABLED should be set to TRUE. This enables an importa nt optimizer feature for star-queries. It is set to FALSE by default for backward-compatibility.

When a data warehouse satisfies these conditions, the majority of the star queries running in the data warehouse will use a query execution strategy known as the star transformation. The star transformation provides very efficient query performance for star queries.

Using Star Transformation

The star transformation is a power ful optimization technique that relies upon implicitly rewriting (or transforming) the SQL of the original star query. The end user n ever needs to know any of the details about the star transformation. Oracle's query optimizer automatically chooses the star transfor mation where appropriate.

The star transform ation is a query transformation aimed at executing star queries efficiently. Oracle processes a star query using two basic phases. Th e first phase retrieves exactly the necessary rows from the fact table (the result set). Bec ause this retrieval utilizes bitmap indexes, it is very efficient. The second phase joins this result set to the dimension tables. An example of an end user query is: "What were the sales and profits for the grocery department of stores in the west and southwest sal es districts over the last three quarters?" This is a simple star query.


Note:

Bitmap indexes are available only if you have purchased the Oracle Database Ent erprise Edition. In Oracle Database Standard Edition, bitmap indexes and star transformation are not available.

Star Transformation with a Bitmap Index

A prerequisite of the star transformation is that there be a single-column bitmap index on every join column of the fact table. T hese join columns include all foreign key columns.

For example, the sales table of the sh sample schema has bitma p indexes on the time_id, channel_id, cust_id, prod_id, and promo_id columns.

Consider the following star query:

SELECT ch.channel_class, c.cust_city, t.calendar_qu
arter_desc,
   SUM(s.amount_sold) sales_amount
FROM sales s, times t, customers c, channels ch
WHERE s.time_id = t.time_id
AND   s.cu
st_id = c.cust_id
AND   s.channel_id = ch.channel_id
AND   c.cust_state_province = 'CA'
AND   ch.channel_desc in ('Internet','Catalog
')
AND   t.calendar_quarter_desc IN ('1999-Q1','1999-Q2')
GROUP BY ch.channel_class, c.cust_city, t.calendar_quarter_desc;

< p>This query is processed in two phases. In the first phase, Oracle Database uses the bitmap indexes on the foreign key columns of th e fact table to identify and retrieve only the necessary rows from the fact table. That is, Oracle Database will retrieve the result set from the fact table using essentially the following query:

SELECT ... FROM sales
WHERE time_id IN
 (SELECT time_id FROM times 
   WHERE calendar_quarter_desc IN('1999-Q1','1999-Q2'))
   AND cust_id IN
  (SELECT cust_id FROM custome
rs WHERE cust_state_province='CA')
   AND channel_id IN
  (SELECT channel_id FROM channels WHERE channel_desc IN('Internet','Catalog'
));

This is the transformation step of the algorithm, because the original star query has been transformed into this subq uery representation. This method of accessing the fact table leverages the strengths of bitmap indexes. Intuitively, bitmap indexes p rovide a set-based processing scheme within a relational database. Oracle has implemented very fast methods for doing set operations such as AND (an intersection in standard set-based terminology), OR (a set-based union), MINUS , and COUNT.

In this star query, a bitmap index on time_id is used to identify the set of all rows i n the fact table corresponding to sales in 1999-Q1. This set is represented as a bitmap (a string of 1's an d 0's that indicates which rows of the fact table are members of the set).

A similar bitmap is retrieved for the fact table ro ws corresponding to the sale from 1999-Q2. The bitmap OR operation is used to combine this set of Q1< /code> sales with the set of Q2 sales.

Additional set operations will be done for the customer dimen sion and the product dimension. At this point in the star query processing, there are three bitmaps. Each bitmap corresp onds to a separate dimension table, and each bitmap represents the set of rows of the fact table that satisfy that individual dimensi on's constraints.

These three bitmaps are combined into a single bitmap using the bitmap AND operation. This fina l bitmap represents the set of rows in the fact table that satisfy all of the constraints on the dimension table. This is the result set, the exact set of rows from the fact table needed to evaluate the query. Note that none of the actual data in the fact table has been accessed. All of these operations rely solely on the bitmap indexes and the dimension tables. Because of the bitmap indexes' com pressed data representations, the bitmap set-based operations are extremely efficient.

Once the result set is identified, the bitmap is used to access the actual data from the sales table. Only those rows that are required for the end user's query are retriev ed from the fact table. At this point, Oracle has effectively joined all of the dimension tables to the fact table using bitmap index es. This technique provides excellent performance because Oracle is joining all of the dimension tables to the fact table with one lo gical join operation, rather than joining each dimension table to the fact table independently.

The second phase of this query is to join these rows from the fact table (the result set) to the dimension tables. Oracle will use the most efficient method for ac cessing and joining the dimension tables. Many dimension are very small, and table scans are typically the most efficient access meth od for these dimension tables. For large dimension tables, table scans may not be the most efficient access method. In the previous e xample, a bitmap index on product.department can be used to quickly identify all of those products in the grocery depart ment. Oracle's optimizer automatically determines which access method is most appropriate for a given dimension table, based upon the optimizer's knowledge about the sizes and data distributions of each dimension table.

The specific join method (as well as in dexing method) for each dimension table will likewise be intelligently determined by the optimizer. A hash join is often the most eff icient algorithm for joining the dimension tables. The final answer is returned to the user once all of the dimension tables have bee n joined. The query technique of retrieving only the matching rows from one table and then joining to another table is commonly known as a semijoin.

Execution Plan for a Star Transformation with a Bitmap Index

The following typical execu tion plan might result from "Star Transformation with a Bitmap Index":

SELECT S
TATEMENT
 SORT GROUP BY
  HASH JOIN
   TABLE ACCESS FULL                          CHANNELS
   HASH JOIN
    TABLE ACCESS FULL
                 CUSTOMERS
    HASH JOIN
     TABLE ACCESS FULL                        TIMES
     PARTITION RANGE ITERATOR
      TABL
E ACCESS BY LOCAL INDEX ROWID       SALES
       BITMAP CONVERSION TO ROWIDS
        BITMAP AND
         BITMAP MERGE
          BITMA
P KEY ITERATION
           BUFFER SORT
            TABLE ACCESS FULL                 CUSTOMERS
           BITMAP INDEX RANGE SCAN
        SALES_CUST_BIX
         BITMAP MERGE
          BITMAP KEY ITERATION
           BUFFER SORT
            TABLE ACCESS FULL
            CHANNELS
           BITMAP INDEX RANGE SCAN            SALES_CHANNEL_BIX
         BITMAP MERGE
          BITMAP KEY ITERA
TION
           BUFFER SORT
            TABLE ACCESS FULL                 TIMES
           BITMAP INDEX RANGE SCAN            SALES_T
IME_BIX

In this plan, the fact table is accessed through a bitmap access path based on a bitmap AND, of thre e merged bitmaps. The three bitmaps are generated by the BITMAP MERGE row source being fed bitmaps from row source trees underneath it. Each such row source tree consists of a BITMAP KEY ITERATION row source which fetches values from the subquery row source tree, which in this example is a full table access. For each such value, the BITMAP KEY ITERATION row source retrieves the bitmap from the bitmap index. After the relevan t fact table rows have been retrieved using this access path, they are joined with the dimension tables and temporary tables to produ ce the answer to the query.

Star Transformation with a Bitmap Join Index

In addition to bitmap indexes, you can use a bitmap join index during star transformations. Assume you have the following additional index structure:

CREATE BITMAP INDEX sales_c_state_bjix
ON sales(customers.cust_state_province)
FROM sales, customers
WHERE
sales.cust_id = customers.cust_id
LOCAL NOLOGGING COMPUTE STATISTICS;

The processing of the same star query using the bit map join index is similar to the previous example. The only difference is that Oracle will utilize the join index, instead of a singl e-table bitmap index, to access the customer data in the first phase of the star query.

Execution Plan for a Star Transformation with a Bitmap Join Index

The following typical execution pl an might result from "Execution Plan for a Star Transformation with a Bitmap Join Index":

SELECT STATEMENT
 SORT GROUP BY
  HASH JOIN
   TABLE ACCESS FULL                          CHANNELS
   HASH JOIN
    TABLE
ACCESS FULL                         CUSTOMERS
    HASH JOIN
     TABLE ACCESS FULL                        TIMES
     PARTITION RANGE
ALL
      TABLE ACCESS BY LOCAL INDEX ROWID       SALES
       BITMAP CONVERSION TO ROWIDS
        BITMAP AND
         BITMAP INDEX S
INGLE VALUE            SALES_C_STATE_BJIX
         BITMAP MERGE
          BITMAP KEY ITERATION
           BUFFER SORT
            TAB
LE ACCESS FULL                 CHANNELS
           BITMAP INDEX RANGE SCAN            SALES_CHANNEL_BIX
         BITMAP MERGE
   BITMAP KEY ITERATION
           BUFFER SORT
            TABLE ACCESS FULL                 TIMES
           BITMAP INDEX RANGE SCAN
            SALES_TIME_BIX

The difference between this plan as compared to the previous one is that the inner part of the bitmap index scan for the customer dimension has no subselect. This is because the join predicate information on customer.cust_state_province can be satisfied with the bitmap join index sales_c_state_bjix.

How Oracle Chooses to Use Star Transformation

The optimizer generates and saves the best plan it can produce witho ut the transformation. If the transformation is enabled, the optimizer then tries to apply it to the query and, if applicable, genera tes the best plan using the transformed query. Based on a comparison of the cost estimates between the best plans for the two version s of the query, the optimizer will then decide whether to use the best plan for the transformed or untransformed version.

If t he query requires accessing a large percentage of the rows in the fact table, it might be better to use a full table scan and not use the transformations. However, if the constraining predicates on the dimension tables are sufficiently selective that only a small po rtion of the fact table needs to be retrieved, the plan based on the transformation will probably be superior.

Note that the o ptimizer generates a subquery for a dimension table only if it decides that it is reasonable to do so based on a number of criteria. There is no guarantee that subqueries will be generated for all dimension tables. The optimizer may also decide, based on the propert ies of the tables and the query, that the transformation does not merit being applied to a particular query. In this case the best re gular plan will be used.

Star Transformation Restrictions

Star transformation is not supported for tables with any of the following characteristics:

  • Queries with a table hint that is incompatible with a bitmap access path

  • Queries that contain bind variables

  • Tables with too few bitmap indexes. There must be a bitmap index on a fact table column for the optimizer t o generate a subquery for it.

  • Remote fact tables. However, remote dimension tables are allowed in the su bqueries that are generated.

  • Anti-joined tables

  • Tables that are already use d as a dimension table in a subquery

  • Tables that are really unmerged views, which are not view partition s

The star transformation may not be chosen by the optimizer for the following cases:

  • T ables that have a good single-table access path

  • Tables that are too small for the transformation to be w orthwhile

In addition, temporary tables will not be used by star transformation under the following conditions:

  • The database is in read-only mode

  • The star query is part of a transaction that is in serializable mode