'; zhtm += ''; zhtm += '

' + pPage + ''; zhtm += '
'; figDoc.write(zhtm); figDoc.close(); popUpWin.focus(); } //--> Using Oracle8 -- Chapter 18 - Using Indexes, Clusters, Caching, and Sorting Effectively

Using Oracle8


Chapter 18

Using Indexes, Clusters, Caching, and Sorting Effectively


Tuning Indexes

Indexes are the first in line of means to reduce disk I/O and to provide a short access path to the desired data. Until version 7.2, Oracle used to provide only b*tree indexes. The task for selecting an index was relatively simple then: have one or not? Several indexing options are available with Oracle8, and now it's challenging to select an optimal indexing strategy.

Indexes are transparent to users
Indexes are transparent to the end-user application. Oracle automatically maintains the indexes during a DML operation on the table. Oracle Optimizer chooses the suitable index during a SQL SELECT operation.

Chapter 8 "Adding Segments for Different Types of Indexes," discusses indexes in more detail; the following sections give you a better idea of how to use indexes effectively with your databases.

When to Use B*Tree Indexes

Balanced tree (b*tree) indexes have been part of the Oracle RDBMS since its inception. It's still the most commonly used indexing mechanism. You should create a b*tree index on a table column if it meets any of the following criteria:

order#
order_date
part#
Quantity
customer#
sales_person
Increase data selectivity
Use a concatenated index (that is, an index on multiple table columns) if these columns are involved in any of these situations. You can create a concatenated index to increase the selectivity of the data by using indexed columns. However, ensure that leading columns from a concatenated index are used frequently in an application's SQL statements.
select * from orders where order# = <order number>;
select * from orders where qty = '1'
select ord.order#, ord.customer#, ord.part#, parts.Description,
ord.Qty, parts.Price
from  orders ord, parts_master parts
where ord.part# = parts.part# ;

When to Use Bitmap Indexes

Bitmap indexes, first introduced in Oracle 7.3, are very frequently used in data warehouse environments. Creating a bitmap index is recommended in the following situations:

When to Use Reverse-Key Indexes

A reverse-key index is very similar to a regular b*tree index, except that the byte order for the indexed column is reversed. Consider a table containing the orders where the primary key, order numbers, is generated in increasing sequence order. Because all the entered orders are nearby, all the new values will go to the same index leaf block, thus causing multiple concurrent inserts into the same block. This is especially a problem in an Oracle parallel server environment in which that block will be pinged among the instances for insertion of new values. If the byte order for the order number is switched now, the numerically adjacent order number will go to different leaf blocks, and pinging is greatly reduced.

Reverse-key indexes are recommended in the following situations:

When to Use Index-Organized Tables

An index-organized table (or index-only table) is very similar to a b*tree index. If your table data is very static and will invariably be retrieved via an indexed key, an index-only table can be a better alternative to combining a regular table and its associated index-it will save the additional I/O to retrieve the additional column from the table data after the index lookup. You can't create another index on an index-only table, however.

Use index-organized tables in the following situations:

The data is frequently retrieved via the primary key.

Consider a table containing a Social Security number, name, and address information. If queries similar to the following are going to use the table extensively, it is advantageous to create the table as an index-organized table:

Select SSN, Name, Address from cust_info where ssn = <SSN> ;

Evaluating Index Usage

The only direct way to confirm the usage of an index is to look at your application code; do this to determine whether the indexed column is appearing in the where clause of a SQL statement. You should reconfirm the index usage by looking at the query's execution plan. However, using an index and avoiding a full table scan doesn't guarantee the optimal indexing strategy. You should further examine the possibility of creating a more selective concatenated index on a table if multiple columns from the same table are appearing in the SQL statement's where clause. Decisions could be simple in an OLTP environment where a transaction deals with a relatively low volume of data indexing. During batch processing, during report generation, or in a DSS environment, however, the decision between various indexing strategies and full table scans performed in parallel could be tricky and might be greatly influenced by machine resources, database layout, the nature of the query, and so on.

A full table scan can be faster than using an index
Under favorable conditions, a full table scan using the Oracle parallel query option might be faster than using an existing index. This is especially true while joining multiple tables. At what point a full table scan becomes faster than an index retrieval depends on many factors, including the available CPU capacity and a table's physical layout.

The following command sequence shows how to use the auto-trace utility to quickly get the execution plan of a SQL statement and confirm the index usage:

Confirm the presence of needed indexes with na execution plan
Look at your time-consuming queries' execution plan to see whether performance can be improved by adding an index or by increasing the selectivity of an existing index (via adding another column to it). In an OLTP system where resource usage by a transaction isn't significant, however, the overall system can reduce the I/O marginally, because every transaction is executed numerous times.

SQL> set autotrace on ;
SQL> select * from item where i_id=1000 ;

  ITEM_ID   ITEM_NAME                   ITEM_PRICE
----------   ------------------------- ----------
ITEM_DATA
--------------------------------------------------
      1000    FAbLm1A84sVcZXgkJbZvSVe        7590
HTXSxYlPUMW5HGc5umArHcJofKDlwiOXPN

 Execution Plan
----------------------------------------------------------

Index used to retrieve data from ITEM table

To measure the overall effectiveness with which an Oracle instance is using indexes, calculate the Index Use Ratio as follows:

Index Use Ratio = 'table fetch by rowid'/
    (table fetch by rowid + table scan rows gotten)

table fetch by rowid and table scan rows gotten are the statistics from the dynamic performance view V$SYSSTAT.

A value of 90 percent or higher for this ratio is recommended. A lower value might be acceptable in a data-warehousing or decision-support system where the full table scans are frequently used.

Index usage doesn't come without a cost. Before creating an index, consider the following factors along with the advantages you'll gain by the presence of an index:

Watch for SQL Statements That Don't Use Indexes

Several SQL queries don't use indexes:

Select name, address from persons where upper(name)='JOHN';
Select name, street_address, city 
from persons 
where city not in ('BOSTON','NEWYORK');
select * from part_master where part# = 999999;
select * from part_master where part# = '999999';

Using Clusters

A cluster is a group of tables. If the application SQL statements frequently join two or more tables, you can improve performance by clustering the tables. Oracle stores the clustered tables in the same data blocks. This reduces the amount of I/O required during the retrieval, because the rows needed from multiple tables are available in the same block. Similarly, you can group multiple table rows with the same value for a non-unique column together if you know that these rows will be processed together. This type of cluster-where a key column is used to group multiple rows-is known as an index cluster. The key column used to group the rows together is the cluster key.

When clusters hurt performance
Using clusters for frequently updated or deleted tables affects performance adversely.

Another type of cluster that Oracle offers is hash clusters-single-table clusters where rows of a table with the same hash value are stored together to enable fast access. These are discussed in detail in a later section.

Consider a telephone billing system where the calls table stores information about each customer's usage. If billing rules indicate that all calls made by a customer in a particular month be billed together, you can store these calls in a cluster whose cluster key could be concatenated for the columns phone_number, year, and month.

Figure 18.1 shows the difference between a normal table and a single-table cluster. Figure 18.2 shows the data storage in a three-table cluster.

Figure 18.1 : Index clusters store the data with the same index key together.

Figure 18.2 : Index clusters store the data from multiple tables with the same cluster key together.

Creating and Managing Index Clusters

Create an index cluster and its tables (general steps)

  1. Create the cluster.
  2. Create the cluster index.
  3. Create the tables.
  4. Insert data and perform DML/query operations.

When to use index clusters
If two or more tables are related via referential integrity and frequently used with joins in the SQL operations, create an index cluster for these tables. If multiple rows from a table are frequently fetched together with a non-unique column, create a single-table index cluster with that column as the cluster key to improve performance.

Creating an Index Cluster

Use the following command to create a single-table cluster for the previously discussed calls table:

SQL> create cluster C_phone_calls ( phone_no number(10),
       year number(4), month number(2))
     pctused 80 pctfree 5
     tablespace users
     storage (initial 10M Next 10M);

Creating a Cluster-Key Index

Oracle Server requires an index on the cluster key before it allows any DML performed against the tables in the index cluster. You can't perform any DML operations until you create the index. To locate a row with a given cluster key, Oracle first looks in the index and reads the corresponding rowid. This rowid is, in turn, used to retrieve the table data for the clustered table(s). The following command creates the cluster-key index I_phone_calls on the c_phone_calls cluster:

Cluster-key indexes can't be unique
Oracle doesn't allow cluster-key indexes to be unique. You can create indexes on a clustered table's other columns. These indexes are maintained independently of the cluster index.
SQL> create index I_phone_calls on cluster c_phone_calls
     pctfree 10
     tablespace user
     storage (initial 1M next 1M);

Creating Tables in an Index Cluster

You can create a table within a cluster after creating the cluster. A cluster-key index can be created before or after the tables within the cluster. However, you can't insert any rows in a cluster's tables until you create the cluster-key index.

The following CREATE TABLE command creates the table phones_calls within the cluster c_phone_calls:

SQL> create table phones_calls 
        ( phone_no number(10), year number(4),
               month number(2),day number(2), 
               duration number(3), .......... )
               cluster C_phone_calls
               (phone_no number(10), year number(4),
               month number(2));

Creating and Managing Hash Clusters

Hash clusters are single-table clusters in which rows with the same hash-key values are stored together. Oracle uses a mathematical hash function to choose the location of a row within the cluster.

A cluster key and the data block in which it will be stored are directly related. The space used for a cluster needs to be allocated during cluster creation. Thus, the size and the number of rows should be accurately known.

When to use hash clusters
Use hash clusters for lookup tables whose data is static and is retrieved mostly by using their primary key.

Consider an item table containing information about 100,000 items and having the following structure:

item_id            number(6,0),
item_name          varchar2(24),
item_price         number(5,0),
item_data          varchar2(50)

Also assume that the item_id is unique and the average row length is 75 bytes. If a 3,750-byte space is available within each Oracle data block, each data block will accommodate 45 rows (3,750 divided by 75) per block. The total storage space needed for this cluster will be 2,223 data blocks (100,000 divided by 45). Oracle rounds it off to a certain higher number and might allocate a few more blocks. The data stored in the cluster will look similar to Figure 18.3.

Figure 18.3 : Hash clusters preallocate the storage location for a row based on the hash-key value.

When Oracle needs to store or retrieve a row with an item_id, it simply applies the hash function to the given item_id to get its block number. In the example, the hash function is mod(45); to retrieve data for item_id 10576, Oracle looks in data block 236 (10,576 divided by 45), needing to read only one disk block.

Use the following SQL statement to create a hash cluster for this data:

create cluster item_cluster (
 item_id            number(6,0)



        create table item (
        item_id            number(6,0),
        item_name          varchar2(24),
        item_price         number(5,0),
        item_data          varchar2(50)
        )
        cluster item_cluster(item_id);
Total number of cluster keys to be stored
Cluster-key column, based on data most frequently retrieved from table
Space (in bytes) used to store data for single-cluster key

Carefully choose the size amount
If space used by the rows associated with a cluster key isn't predictable, exercise caution while using clusters. If space usage frequently exceeds allocated space, chaining will take place and will result in wasted disk space and increased I/O.

Restrictions on Clustered Tables

Clustered tables have some restrictions:

Evaluating Cluster Usage

Oracle Server automatically maintains the clusters once they're created. Oracle will also use the underlying hash or the index cluster key for data retrieval whenever the cluster key is used in a SQL statement's where clause.

The following example uses the autotrace utility to see the SQL query's execution plan. The execution plan shows that the table is accessed via HASH of ITEM, thus using the hash cluster key to retrieve the specified data.

SQL> set autotrace on ;
SQL> select item_id,i_name from item where item_id=1000;

   ITEM_ID I_NAME
---------- ------------------------
      1000 FAbLm1A84sVcZXgkJbZvSVe
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (HASH) OF 'ITEM'

Caching Data

Caching is the most common approach for improving retrieval rates of large random data. The effectiveness of caching depends on the size of the cache memory available and the access pattern of the data. Oracle uses a significant part of the Shared Global Area (SGA) as the buffer cache. Its size can be specified by the DB_BLOCK_BUFFERS initialization parameter in terms of total number of Oracle data blocks, and it's generally referred to as the buffer cache or buffer pool. By default, Oracle uses the Least Recently Used (LRU) algorithm for maintaining the buffer cache, in which the least recently used blocks are used first.

When an Oracle user process needs a data block it proceeds as follows:

  1. It looks in the buffer cache for the desired block.
  2. It finds a free buffer in the buffer cache and reserves it.
  3. If it doesn't find a free buffer after searching a predetermined number of buffers, it signals the DBWR. The DBWR, in turn, writes the modified data blocks to the disk. The buffers written to the disk are available for reuse.
  4. The process reads the block from the disk in the buffer cache.

Figure 18.4 depicts these steps in the form of a flowchart.

Figure 18.4 : An Oracle process follows this algorithm to access a data block.

The effectiveness of caching depends on the access pattern for the data. The access pattern may vary vastly from object to object, however. To improve caching effectiveness, Oracle8 offers a multiple-buffer pool and a CACHE attribute for the objects.

Setting Up a Multiple-Buffer Pool

Oracle8 lets you divide the buffer pool into three different pools:

By default, Oracle8 enables only one buffer pool-that is, the default buffer pool and all the buffers allocated to it. To assign buffers to the KEEP and RECYCLE buffers, use the following initialization parameters:


Assigns 10,000 total buffers to the pool
Assigns 10 LRU latches
Assigns 1,000 buffers and 2 latches to the KEEP pool
Assigns 3,000 buffers and 2 latches to the RECYCLE pool

Figure 18.5 shows the buffer pool allocation specified by these parameters.

Figure 18.5 : Oracle8 can divide a buffer pool in three sections.

Assigning Objects to Buffer Pools

By default, all objects use the DEFAULT buffer pool. To cache the objects in other pools, you need to create the object with the desired buffer pool attribute or alter the object's buffer pool attribute by using the ALTER command. ALTER and CREATE for the table, partition, index, cluster, snapshot, and snapshot log support the buffer pool attribute in their storage clause. You can assign different buffer pools to different partitions of a partitioned object.

Use the following command to create and assign the example item table to the KEEP buffer pool:

    item_id            number(6,0),
    item_name          varchar2(24),
    item_price         number(5,0),
    item_data          varchar2(50)
    )
storage (initial 1M next 1M buffer_pool keep);

If the table already exists, use the following command to assign it to the KEEP buffer pool:

Alter table item storage (buffer_pool keep);

Caching Objects with the CACHE Attribute

In addition to grouping the objects in the KEEP, RECYCLE, and DEFAULT buffer pools according to their access patterns, you can also specify the CACHE attribute for very frequently accessed tables; this makes it so that buffers for these tables are read at the most recently used end of the LRU during their full table scans.

Choosing objects to cache
The caching strategy for a particular object depends on its access pattern and the available buffer cache. If object is a frequently used reference table, assign it to the KEEP buffer pool. On the other hand, assign large, randomly accessed table to the RECYCLE buffer pool. The proper caching strategy should factor the available memory, disk I/and overall response time for transactions.

By default, Oracle always keeps the read blocks at the least recently used end of the LRU list during a full table scan. You can specify the CACHE attribute for the table with the CREATE or ALTER command to keep the read blocks at the most recently used end of the LRU chain. Use this command to alter the item table to use the cache attribute:

Alter table item cache;

If you don't want to permanently change a table's CACHE attribute but still want to read it at the most recently used end of the LRU list during the current full table scan (due to special processing requirements), use the CACHE hint in the SELECT statement. Similarly, you can use the NOCACHE hint to disable a table's CACHE attribute during the current full table scan.

Evaluating Caching Effectiveness

The effectiveness of a buffer cache is measured in terms of the cache hit ratio or the buffer's hit ratio, defined as the percentage of times a desired data block was found in the buffer cache. You can calculate the buffer hit ratio from the statistics available in the V$SYSSTAT dynamic performance view as follows:

Cache hit ratio = 100*(1- 'physical reads'/ 'Logical reads')

Logical reads is further derived as the following:

Logical Reads = 'db block gets' + 'consistent gets'

These statistics from the V$SYSSTAT view give you the combined statistics for the buffer cache as a whole. If you've defined KEEP and RECYCLE buffer pools, it's strongly recommended that you evaluate the hit ratios for them separately. The V$BUFFER_POOL dynamic performance view gives physical reads, block gets, and consistent gets statistics for each buffer pool.

Tuning Sorts

Sorting is the rearranging of a random collection of data sets into an ordered collection-sorting the data in order. You need to sort the data to make it more presentable and usable.

Sorting data consumes memory and CPU resources. The time taken and resources consumed to sort are proportional to the amount of data sorted. If the volume of the data to be sorted is more than the available memory, the sort operations must use disks to store intermediate sort results (further slowing the sort operation). Reducing disk I/O is one primary focus of Oracle8 (or of any computer system for that matter), because disks operations are comparatively slower; they involve movement of mechanical components and should be optimized to the fullest possible extent.

Oracle8 performs the sort operation in memory, allocating memory up to the maximum specified by the initialization parameter SORT_AREA_SIZE. If the volume of the data to be sorted is more than the SORT_AREA_SIZE, it uses temporary disk segments. The amount of disk space it uses depends mainly on the volume of the data to be sorted.

When does Oracle perform sort operations?
Oracle8 performs sorts while creating indexes and executing SQL operations that contain order by, group by, distinct, union, join, unique, and aggregate operations such as max, min, and so on. The analyze command also sorts data to calculate statistics.

Understanding Sort Behavior

The V$SYSSTAT dynamic performance table contains the statistics sorts (memory) and sorts (disk), indicating the number of in-memory and on-disk sorts the instance has performed since instance startup. It also contains the sorts (rows) statistics, which indicates the total number of rows sorted by the instance. Use the following query to see these statistics:

SQL> column name format a20
SQL> select name,value from v$sysstat where name like 'sort%'

NAME                          VALUE
--------------------     ----------
sorts (memory)              11462
sorts (disk)                    3
sorts (rows)                5187983
Calculating the in-memory sort ratio
In an OLTP environment, the sorts (disk) number should be a very low percentage of the total sorts. In Memory Sort Ratio can be calculated as In Memory Sort Ratio = 'sorts (memory')/( 'sorts (memory)' + 'sorts (disk)'). A value very near 100 percent is recommended for In Memory Sort Ratio. A lower value of this ratio indicates that the workload is performing too many disk sorts.

In this system there were 11,462 sorts in the memory. Three sorts were done by using the disk.

Optimizing Sort Operations

Sorts are one of the most resource-intensive operations in any database. If sorts are taking too much time on your system and are a cause of concern, consider the following options to minimize their impact:

SQL> alter index I_orders rebuild ;
SQL> analyze table orders estimate statistics ;

Setting Sort-Related Parameters

It's certain that the workload will need some amount of sorting. You can minimize the disk sorts or enhance performance of disk sorts by using the following Oracle8 parameters to appropriate value depending on the application nature:

SORT_READ_FAC=(avg. seek time + avg. latency time
+ block transfer rate)/(block transfer time)
Should you set SORT_DIRECT_WRITE to TRUE or AUTO?
The values of SORT_WRITE_ BUFFER_SIZE and SORT_WRITE_BUFFER are considered only when SORT_DIRECT_WRITE is set to TRUE. The default values of these parameters are effective if SORT_ DIRECT_WRITE is set to AUTO.

Managing Temporary Segments

Oracle8 allocates temporary segments used during sorts from the tablespace defined as the user's temporary tablespace. Because the default value for this tablespace is SYSTEM, you must change it to some other tablespace.

Improving sort performance
To improve the performance of sort operations, assign a TEMPORARY-type tablespace as the user's temporary tablespace, setting the DIRECT_SORT_WRITE initialization parameter to TRUE, or setting the value of initialization parameter SORT_AREA_SIZE to a larger value.

A production system will often need one or several tablespaces dedicated as temporary. To improve performance, you should alter these tablespaces to be of type TEMPORARY. If you use a permanent tablespace as temporary tablespace for sorting, each user has his dedicated sort segments deallocated at the end of the current operation. If you use a TEMPORARY-type tablespace as the temporary tablespace for users, the segments, once allocated by the instance, aren't deallocated and can be reused by other users who have been assigned to the same tablespace. This saves the overhead of allocating and deallocating the extents during the sort operations.

It's recommended that you stripe the temporary tablespace to improve I/O performance during the sort write operations. Striping the temporary tablespace on the number of disks equal to the SORT_DIRECT_BUFFERS will yield the maximum performance benefit.

View disk sort usage statistics
The dynamic performance view V$SORT_SEGMENT provides the usage statistics for the TEMPORARY tablespace. The V$SORT_USAGE view contains statistics related to the current sort activity in the instance.

Set the initial and next extent parameters for the temporary tablespace as multiples of the SORT_AREA_SIZE in order to optimize extent allocation.


© Copyright, Macmillan Computer Publishing. All rights reserved.