| Ora
cle® Database Data Warehousing Guide 10g Release 1 (10.1) Part Number B10736-01 |
|
|
View PDF |
This chapter explains how to create a logical design for a data warehousing environment and includes the f ollowing topics:
Your organization has decided to build a data warehouse. You have define d the business requirements and agreed upon the scope of your application, and created a conceptual design. Now you need to translate your requirements into a system deliverable. To do so, you create the logical and physical design for the data warehouse. You then d efine:
The specific data content
Relationships within and between groups of data
The system environment supporting your data warehouse
The data transf ormations required
The frequency with which data is refreshed
The logical design is mor e conceptual and abstract than the physical design. In the logical design, you look at the logical relationships among the objects. I n the physical design, you look at the most effective way of storing and retrieving the objects as well as handling them from a trans portation and backup/recovery perspective.
Orient your design toward the needs of the end users. End users typically want to p erform analysis and look at aggregated data, rather than at individual transactions. However, end users might not know what they need until they see it. In addition, a well-planned design allows for growth and changes as the needs of users change and evolve.
By beginning with the logical design, you focus on the information requirements and save the implementation details for later.
A logical design is conceptual and abstract. You do not deal with the physical implementation details yet. You deal only with defining the types of information that you need.
One technique you can use to model your organization's logical information requirement s is entity-relationship modeling. Entity-relationship modeling involves identifying the things of importance (entities), the propert ies of these things (attributes), and how they are related to one another (relationships).
The process of logical design invol ves arranging data into a series of logical relationships called entities and attributes. An entity represents a chunk of information. In relational databases, an entity often m aps to a table. An attribute is a component of an entity that helps define the uniqueness of the entity. In relational databases, an attribute maps to a column.
To be sur e that your data is consistent, you need to use unique identifiers. A unique identifier is something you add to tables so that you can differentiate between the same item when i t appears in different places. In a physical design, this is usually a primary key.
While entity-relationship diagramming has traditionally been associated with highly normalized models such as OLTP applications, the technique is still useful for data warehou se design in the form of dimensional modeling. In dimensional modeling, instead of seeking to di scover atomic units of information (such as entities and attributes) and all of the relationships between them, you identify which in formation belongs to a central fact table and which information belongs to its associated dimension tables. You identify business sub jects or fields of data, define relationships between business subjects, and name the attributes for each subject.
Your logical design should result in (1) a set of entities and attributes corresponding to f act tables and dimension tables and (2) a model of operational data from your source into subject-oriented information in your target data warehouse schema.
You can create the logical design using a pen and paper, or you can use a design tool such as Oracle W arehouse Builder (specifically designed to support modeling the ETL process) or Oracle Designer (a general purpose modeling tool).
|
See Also: Oracle Desi gner and Oracle Warehouse Builder documentation sets |
A schema is a collection of database objects, including tables, views, indexes, and synonyms. You ca n arrange schema objects in the schema models designed for data warehousing in a variety of ways. Most data warehouses use a dimensio nal model.
The model of your source data and the requirements of your users help you design the data warehouse schema. You can sometimes get the source model from your company's enterprise data model and reverse-engineer the logical data model for the data wa rehouse from this. The physical implementation of the logical data warehouse model may require some changes to adapt it to your syste m parameters—size of machine, number of users, storage capacity, type of network, and software.
The star schema is the simplest d ata warehouse schema. It is called a star schema because the diagram resembles a star, with points radiating from a center. The cente r of the star consists of one or more fact tables and the points of the star are the dimension tables, as shown in Figure 2-1.
The most natural way to model a data warehouse is as a star schema, where only one join establishes the relationship between the fact table and any one of the dimension tables.
A star schema optimizes performance by keeping queries simple and providing fast response time. All the information ab out each level is stored in one row.
|
Note: Oracle Corporation recommends that you choose a star schema unless you have a clear reason not to. |
Some schemas in data warehousin g environments use third normal form rather than star schemas. Another schema that is sometimes useful is the snowflake schema, which is a star schema with normalized dimensions in a tree structure.
|
See Also: Chapter 19, " Schema Modeling Techniques" for further information regarding star and snowflake schemas in data warehouses and Oracle Database Concepts for further conceptual material |
Fact tables and dimension tables are the two types of objects commonly used in dimensional data warehouse sche mas.
Fact tables are the large tables in your data warehouse schema that store business m
easurements. Fact tables typically contain facts and foreign keys to the dimension tables. Fact tables represent data, usually numeri
c and additive, that can be analyzed and examined. Examples include sales, cost, and profit.
p>
Dimension tables, also known as lookup or reference tables, contain the relatively static
data in the data warehouse. Dimension tables store the information you normally use to contain queries. Dimension tables are usually
textual and descriptive and you can use them as the row headers of the result set. Examples are customers or produ
cts.
A fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tabl es. A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables. A fact table usually contains facts with the same level of aggrega tion. Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be aggregated by simple arit hmetical addition. A common example of this is sales. Non-additive facts cannot be added at all. An example of this is averages. Semi -additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where yo u cannot tell what a level means simply by looking at it.
A dimension is a structu re, often composed of one or more hierarchies, that categorizes data. Dimensional attributes help to describe the dimensional value. They are normally descriptive, textual values. Several distinct dimensions, combined with facts, enable you to answer business questi ons. Commonly used dimensions are customers, products, and time.
Dimension data is typically collected at the lowest level of detail and then aggregated into higher level totals that are more useful for analysis. These nat ural rollups or aggregations within a dimension table are called hierarchies.
Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy migh t aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational d rill path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and b elow it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hi erarchy. For example, in the product dimension, there might be two hierarchies—one for product categories and one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granulari ty. This is one of the key benefits of a data warehouse.
When designing hierarchies, you must consider the relationships in bu siness structures. For example, a divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its chi ldren. These familial relationships enable analysts to access data quickly.
A level represents a position in a hierarchy. For example, a tim e dimension might have a hierarchy that represents data at the month, quarter, and year levels. Levels range from general to specific , with the root level as the highest or most general level. The levels in a dimension are organi zed into one or more hierarchies.
Level relationships specify top-to-bottom ordering of levels from most general (the root) to most specific information. They define the pare nt-child relationship between the levels in a hierarchy.
Hierarchies are also essential components in enabling more complex re writes. For example, the database can aggregate an existing sales revenue on a quarterly base to a yearly aggregation when the dimens ional dependencies between quarter and year are known.
Figure 2-2 illustrates a dimension hierarchy based on customers.
|
See Also: Chapter 10, " Dimensions" and Chapte r 18, " Query Rewrite" for further information regarding hierarchies |
Unique identifiers are specified for
one distinct record in a dimension table. Artificial unique identifiers are often used to avoid the potential problem of unique iden
tifiers changing. Unique identifiers are represented with the # character. For example, #customer_id.
Relationships guarantee business integrity. An example is that if a business sells someth ing, there is obviously a customer and a product. Designing a relationship between the sales information in the fact table and the di mension tables products and customers enforces the business rules in databases.
Figure 2-3 illustrates a common example of a sales fac
t table and dimension tables customers, products, promotions, times, and ch
annels.