| Oracle® Database Administrator's Guide 10g Release 1 (10.1) Part Number B10739-01 |
|
|
View PDF | tr>
This chapter describes how to manage hash clusters, and contains the following topics:
Storing a tab le in a hash cluster is an optional way to improve the performance of data retrieval. A hash cluster provides an alternative to a nonclustered table with an index or an index cluster. With an indexed table or index cluster, O racle Database locates the rows in a table using key values that the database stores in a separate index. To use hashing, you create a hash cluster and load tables into it. The database physically stores the rows of a table in a hash cluster and retrieves them accor ding to the results of a hash function.
Oracle Database uses a hash function to generate a distribution of numeric values, called hash values, that are based on specific cluster key values. The key of a hash cluster, like the key of an index cluster, can be a single column or composite key (multiple column key). To find or store a row in a hash cluster, the database applies the hash function to the cluster key value of the row. The resulting hash value corresponds to a data block in the cluster, which the database then reads or writes o n behalf of the issued statement.
To find or store a row in an indexed table or cluster, a minimum of two (there are usually m ore) I/Os must be performed:
One or more I/Os to find or store the key value in the index
Another I/O to read or write the row in the table or cluster
In contrast, the database uses a hash fu nction to locate a row in a hash cluster; no I/O is required. As a result, a minimum of one I/O operation is necessary to read or wri te a row in a hash cluster.
|
See Also: Chapter 13, " Managing Space for Schema Objects" is recommended reading before attempting tasks described in this chapter. |
This section helps you decide when to use hash clusters by contrasting situations where hashing is most useful against situations where there is no advantage. If you find your decision is to use indexing rather than hashing, then you should consider whether to store a table individually or as p art of a cluster.
|
Note: E ven if you decide to use hashing, a table can still have separate indexes on any columns, including the cluster key. |
Hashing is useful when you have the following conditions:
Most queries are equality queries on the c luster key:
SELECT ... WHERE cluster_key = ...;
In such cases, the cluster key in the equal ity condition is hashed, and the corresponding hash key is usually found with a single read. In comparison, for an indexed table the key value must first be found in the index (usually several reads), and then the row is read from the table (another read).
The tables in the hash cluster are primarily static in size so that you can determine the number of rows and amou nt of space required for the tables in the cluster. If tables in a hash cluster require more space than the initial allocation for th e cluster, performance degradation can be substantial because overflow blocks are required.
A hash cluster is created using a CREATE CLUSTER statement, but you specify a HASHKEYS clause. The following example contains a statement to create a cluster named trial_cluster that stores the trial
table, clustered by the trialno column (the cluster key); and another statement creating a table in the cluster.
CREATE CLUSTER trial_cluster (trialno NUMBER(5,0))
PCTUSED 80 PCTFREE 5
TABLESPACE users
S
TORAGE (INITIAL 250K NEXT 50K
MINEXTENTS 1 MAXEXTENTS 3
PCTINCREASE 0)
HASH IS trialno HASHKEYS 150;
CREATE
TABLE trial (
trialno NUMBER(5,0) PRIMARY KEY,
...)
CLUSTER trial_cluster (trialno);
As with index clusters, the key of a hash cluster can be a single column or a composite key (multiple column key). In this example, it is a single column.
The HASHKEYS value, in this case
150, specifies and limits the number of unique hash values that can be generated by the hash
function used by the cluster. The database rounds the number specified to the nearest prime number.
If no HASH IS clause is specified, the database uses an interna
l hash function. If the cluster key is already a unique identifier that is uniformly distributed over its range, you can bypass the i
nternal hash function and specify the cluster key as the hash value, as is the case in the preceding example. You can also use the
You cannot create a cluster index on a hash cluster, and yo u need not create an index on a hash cluster key.
For additional information about creating tables in a cluster, guidelines fo
r setting parameters of the CREATE CLUSTER statement common to index and hash clusters, and the privileges required to c
reate any cluster, see Chapter 17, " Managing Clusters". The following sections explain and provid
e guidelines for setting the parameters of the CREATE CLUSTER statement specific to hash clusters:
In a sorted hash cluster, the rows corresponding to each value of the hash function are sor ted on a specified set of columns in ascending order, which can improve response time during subsequent operations on the clustered d ata.
For example, a telecommunications company needs to store detailed call records for a fixed number of originating telephon e numbers through a telecommunications switch. From each originating telephone number there can be an unlimited number of telephone c alls.
Calls are stored as they are made and processed later in first-in, first-out order (FIFO) when bills are generated for e ach originating telephone number. Each call has a detailed call record that is identified by a timestamp. The data that is gathered i s similar to the following:
| Originating Telephone Numbers | Call Records Identified by Timestamp |
|---|---|
| 650-555-1212 | t0, t1, t2, t3, t4 , ... |
| 650-555-1213 | t0, t1, t2, t3, t4, ... |
| 650-555-1214 | t0, t1, t2, t3, t4, ... |
| ... | ... |
In the following SQL statements, the telephone_number column is the hash key. Th
e hash cluster is sorted on the call_timestamp and call_duration columns. The number of hash keys is based
on 10-digit telephone numbers.
CREATE CLUSTER call_detail_cluster ( telephone_number NUMBER, ca ll_timestamp NUMBER SORT, call_duration NUMBER SORT ) HASHKEYS 10000 HASH IS telephone_number SIZE 256; CREATE TABLE call _detail ( telephone_number NUMBER, call_timestamp NUMBER SORT, call_duration NUMBER SORT, other_ info VARCHAR2(30) ) CLUSTER call_detail_cluster ( telephone_number, call_timestamp, call_duration );
Giv en the sort order of the data, the following query would return the call records for a specified hash key by oldest record first.
SELECT * WHERE telephone_number = 6505551212;
You can also create a single-table hash cluster, which provides fas
t access to rows in a table. However, this table must be the only table in the hash cluster. Essentially, there must be a one-to-one
mapping between hash keys and data rows. The following statement creates a single-table hash cluster named peanut with t
he cluster key variety:
CREATE CLUSTER peanut (variety NUMBER) SIZE 512 SINGLE TABLE HA SHKEYS 500;
The database rounds the HASHKEYS value up to the nearest prime number, so this cluster has a max
imum of 503 hash key values, each of size 512 bytes. The SINGLE TABLE clause is valid only for hash clusters. HASH
KEYS must also be specified.
When creating a hash cluster, it is important to choose the cluster key c
orrectly and set the HASH IS, SIZE, and HASHKEYS parameters so that performance and space use
are optimal. The following guidelines describe how to set these parameters.
Choosing the correct cluster key is dependent on the most common types of queries issued against the clustered tables. For
example, consider the emp table in a hash cluster. If queries often select rows by employee number, the empno column should be the cluster key. If queries often select rows by department number, the deptno column should be the
cluster key. For hash clusters that contain a single table, the cluster key is typically the entire primary key of the contained tab
le.
The key of a hash cluster, like that of an index cluster, can be a single column or a composite key (multiple column key). A hash cluster with a composite key must use the internal hash function of the database.
a>Specify the HASH IS parameter only if the cluster key is a single column of the NUMBER datatype, and cont
ains uniformly distributed integers. If these conditions apply, you can distribute rows in the cluster so that each unique cluster ke
y value hashes, with no collisions (two cluster key values having the same hash value), to a unique hash value. If these conditions d
o not apply, omit this clause so that you use the internal hash function.
SIZE should be set to the average amount of space required to hold all rows for any given hash key
. Therefore, to properly determine SIZE, you must be aware of the characteristics of your data:
If the hash cluster is to contain only a single table and the hash key values of the rows in that table are unique (one row for
each value), SIZE can be set to the average row size in the cluster.
If the hash cluster is
to contain multiple tables, SIZE can be set to the average amount of space required to hold all rows associated with a
representative hash value.
Further, once you have determined a (preliminary) value for SIZE, consider t
he following. If the SIZE value is small (more than four hash keys can be assigned for each data block) you can use this
value for SIZE in the CREATE CLUSTER statement. However, if the value of SIZE is large (four
or fewer hash keys can be assigned for each data block), then you should also consider the expected frequency of collisions and wheth
er performance of data retrieval or efficiency of space usage is more important to you.
If the hash clus
ter does not use the internal hash function (if you specified HASH IS) and you
expect few or no collisions, you can use your preliminary value of SIZE. No collisions occur and space is used as effici
ently as possible.
If you expect frequent collisions on inserts, the likelihood of overflow blocks being
allocated to store rows is high. To reduce the possibility of overflow blocks and maximize performance when collisions are frequent,
you should adjust SIZE as shown in the following chart.
| Available Space for each Block / Calculated SIZE | Setting for SIZE |
|---|---|
| 1 | SIZE |
| 2 | SIZE + 15% |
| 3 | SIZE + 12% |
| 4 | SIZE + 8% | >4 | < code>SIZE |
Overestimating the value of SIZE increases the amount of unused space in the cluster. If space efficiency is more important than the performance of data retrieval,
disregard the adjustments shown in the preceding table and use the original value for SIZE.
For maximum
distribution of rows in a hash cluster, the database rounds the HASHKEYS value up to the nearest prime number.
The following examples show how to correctly choose the cluster key
and set the HASH IS, SIZE, and HASHKEYS parameters. For all examples, assume that the data blo
ck size is 2K and that on average, 1950 bytes of each block is available data space (block size minus overhead).
You decide to load the emp table into a has
h cluster. Most queries retrieve employee records by their employee number. You estimate that the maximum number of rows in the emp table at any given time is 10000 and that the average row size is 55 bytes.
In this case, empno shoul
d be the cluster key. Because this column contains integers that are unique, the internal hash function can be bypassed. SIZE
code> can be set to the average row size, 55 bytes. Note that 34 hash keys are assigned for each data block. HASHKEYS ca
n be set to the number of rows in the table, 10000. The database rounds this value up to the next highest prime number: 10007.
CREATE CLUSTER emp_cluster (empno NUMBER) . . . SIZE 55 HASH IS empno HASHKEYS 10000;
Conditions similar to the previous example exist. In this case, ho wever, rows are usually retrieved by department number. At most, there are 1000 departments with an average of 10 employees for each department. Department numbers increment by 10 (0, 10, 20, 30, . . . ).
In this case, deptno should be the cluste
r key. Since this column contains integers that are uniformly distributed, the internal hash function can be bypassed. A preliminary
value of SIZE (the average amount of space required to hold all rows for each department) is 55 bytes * 10, or 550 bytes
. Using this value for SIZE, only three hash keys can be assigned for each data block. If you expect some collisions and
want maximum performance of data retrieval, slightly alter your estimated SIZE to prevent collisions from requiring ove
rflow blocks. By adjusting SIZE by 12%, to 620 bytes (refer to "Setting SIZE"), there is more sp
ace for rows from expected collisions.
HASHKEYS can be set to the number of unique department numbers, 1000. The
database rounds this value up to the next highest prime number: 1009.
CREATE CLUSTER emp_cluster (deptn o NUMBER) . . . SIZE 620 HASH IS deptno HASHKEYS 1000;
As with index clusters, it is important to estimate the storage required for the data in a hash cluster.
Oracle
Database guarantees that the initial allocation of space is sufficient to store the hash table according to the settings SIZE
code> and HASHKEYS. If settings for the storage parameters INITIAL, NEXT, and MINEXTENTS
do not account for the hash table size, incremental (additional) extents are allocated until at least SIZE*HASHKEYS is reached. For example, assume that the data block size is 2K, the available data space for each block is approximately 1900 byt
es (data block size minus overhead), and that the STORAGE and HASH parameters are specified in the CR
EATE CLUSTER statement as follows:
STORAGE (INITIAL 100K
NEXT 150K
MINEXTENTS 1
PCTI
NCREASE 0)
SIZE 1500
HASHKEYS 100
In this example, only one hash key can be assigned for each data block. Therefore, the initial space required for the hash cluster is at least 100*2K or 200K. The settings for the storage parameters do not account for th is requirement. Therefore, an initial extent of 100K and a second extent of 150K are allocated to the hash cluster.
Alternativ
ely, assume the HASH parameters are specified as follows:
SIZE 500 HASHKEYS 100< p>In this case, three hash keys are assigned to each data block. Therefore, the initial space required for the hash cluster is at lea st 34*2K or 68K. The initial settings for the storage parameters are sufficient for this requirement (an initial extent of 100K is al located to the hash cluster).
You can alter a hash cl
uster with the ALTER CLUSTER statement:
ALTER CLUSTER emp_dept . . . ;
The imp
lications for altering a hash cluster are identical to those for altering an index cluster, described in "Altering Clusters". However, the SIZE, HASHKEYS, and HASH IS parameters cannot be spec
ified in an ALTER CLUSTER statement. To change these parameters, you must re-create the cluster, then copy the data from
the original cluster.
You can drop a hash cluster using the DROP CLUSTER statement:
DROP CLUSTER emp_dept;
A table in a hash cluster is dropped using the
DROP TABLE statement. The implications of dropping hash clusters and tables in hash clusters are the same as those for droppin
g index clusters.
| < table summary="layout table" cellspacing="0" cellpadding="0" align="left" width="90"> | |
![]() Previous |
![]() Next |

Home |
Book List |
Contents |
Index |
<
td align="center" valign="top">![]() Feedback a> |