Today you will learn about some of the advanced options available to Oracle, including the Oracle Parallel Server option (OPS) and the Oracle Parallel Query option. Although these options sound as though they would be similar to each other, they have no correlation.
New Term: The Oracle Parallel Server option, an add-on feature, allows more than one instance of Oracle to share the same database; two or more systems can simultaneously access the same data, providing a performance boost and failover capabilities where one node can take over for another in the event of a failure. This linking of systems to form one larger system has traditionally been called a cluster.
NOTE: The term cluster has historically been used to identify a large system that performs a single task and is made up of two or more smaller systems working in tandem.
Numerous vendors have recently introduced "clustering" systems that consist of an offline standby node or another node running a completely different application that can take over some tasks in the event the primary node fails. In my opinion, this is not a clustering solution. Because vendors are piggybacking on cluster terminology, the end-user is being deceived. Vendors such as Oracle, Tandem, DEC, and others have had true clusters for many years and are being cheated by these standby systems.
My rule of thumb is that if you cannot access all data from all nodes all the time, it is not a cluster.
The Oracle parallel server cluster has been available for many years, but has recently been introduced on the NT platform. If you use the features of the Parallel Server option, both performance and system uptime can be improved.
The Oracle Parallel Query option is not a clustering option; it has nothing to do with clustering (except that you can use it on a cluster). This option allows certain SQL operations to be parallelized. By parallelizing these operations, you can improve performance by utilizing the time the system waits for I/O operations to complete.
The Parallel Query option can provide a good-to-outstanding performance improvement over traditional query operations. Today you will learn how to use the Parallel Query option as well as how to tune your application to better perform with the Parallel Query option.
The Parallel Server option is one of the most innovative and impressive options available from Oracle. With this option you can cluster several computers using a shared-disk subsystem and have multiple Oracle instances access the same database, as shown in Figure 19.1. If your application is suitable, you can see very good scalability by adding extra computers.
New Term: The Oracle Parallel Server option uses a sophisticated locking mechanism in conjunction with a shared-disk subsystem to allow multiple instances to access the same data. Communication between the computers occurs through a server interconnect, which usually consists of high-speed network accesses at a very low level. Using the traditional network stack does not provide the performance required for a server interconnect.
A parallel-server configuration.
The server interconnect provides two functions: to communicate locking information and to act as a system heartbeat. The system heartbeat communicates to other systems in the cluster that the server is still operational. If the heartbeat message does not arrive, other servers in the cluster assume that the system is nonfunctional and roll back transactions that have not been committed.
The shared-disk subsystem allows all members of the cluster to access all shared data in the cluster. This disk subsystem must be shared and should be available at all times, regardless of the state of any particular server. This subsystem can use RAID and data caching as long as both operations are performed on the disk side of the shared channel rather than on the controller side. Because the release of a lock sometimes depends on data being written out to the shared disk, the more performance you have, the better off you are.
Locking is performed with a process called the Distributed Lock Manager (DLM). The DLM is responsible for locking data that is being modified so that the data cannot be modified in another instance. Locking ensures data integrity across the entire cluster. A data block or group of blocks is locked until another instance needs that data.
If you can partition your users so that users accessing data in a particular table all use the same instance to access that data, you will have reduced lock contention. You can enhance performance by carefully partitioning the data and the users. If you partition the data into update-intensive and read-intensive tables, you will also benefit.
At instance startup, a number of Parallel Cache Management (PCM) locks are created. PCM locks lock data blocks being accessed within each instance to guarantee that multiple instances do not alter the same data.
You can use PCM locks to lock data blocks for reading or for updating. If a PCM lock is used as a read-lock, other instances can acquire read-locks on the same data blocks. It is only when updating that an exclusive lock must be acquired.
PCM locks are allocated to datafiles; as such, they give you some flexibility over the configuration of the locks. A PCM lock locks one or more data blocks, depending on the number of PCM locks allocated to the datafile and the size of the datafile. Because an inherent overhead is associated with PCM locks, it is not beneficial to overconfigure the locks.
If you know your data-access patterns, you can configure your system based on these general rules:
The dynamic performance tables V$BH, V$CACHE, and V$PING contain information about the frequency of PCM lock contention. By looking at the FREQUENCY column in these tables, you can get an idea of the number of times lock conversions took place because of contention between instances.
The dynamic performance table, V$LOCK_ACTIVITY, provides information about all types of PCM lock conversions. From this information you can determine whether a particular instance is seeing a dramatic change in lock activity. An increase in lock activity might indicate that you have an insufficient number of PCM locks on that instance. With this information, you can use the V$BH, V$CACHE, and V$PING tables to identify the problem area.
The Parallel Server option can be effective if your application is partitionable. If all the users in your system must access the same data, a parallel server might not be for you. But if you can partition your workload into divisions based on table access or if you need a fault-tolerant configuration, the Parallel Server option may work.
If you use the Parallel Server option, you must take special care to properly configure the system. By designing the system properly, you can take maximum advantage of the parallel server features.
The Oracle Parallel Query option makes it possible for some Oracle functions to be processed by multiple server processes. These functions are queries, index creation, data loading, and recovery. In each of these functions, the general principle is the same: Keep processing while Oracle waits for I/O.
For most queries, the time spent waiting for the data to be retrieved from disk usually overshadows the amount of time actually spent processing the results. With the Parallel Query option, you can compensate for this by using several server processes to execute the query. While one process is waiting for I/Os to complete, other processes can be executing. If you are running on a Symmetric Multiprocessor (SMP) computer, a cluster, or a Massively Parallel Processing (MPP) machine, you can take maximum advantage of the Parallel Query option.
Many different types of SQL functions can take advantage of the Parallel Query option, including
These operations and several others will automatically be parallelized based on the database-initialization parameters and table definitions.
Parallel query processing allows certain Oracle statements to be run in parallel by multiple server processes. The Oracle server can process the following statements in parallel:
Parallel queries are effective on large operations such as table scans and sorts.
With traditional queries such as table scans, the server process reads the data sequentially, as shown in Figure 19.2. Much of the time spent in this query is spent waiting for I/Os to complete.
A table scan performed without the use of the Parallel Query option.
New Term: A parallel query splits the query into several pieces, each handled by a different server process. These processes, called query servers, are dispatched by a process known as the query coordinator. The query coordinator dispatches the query servers and coordinates the results from all the servers to send back to the user. The result of this arrangement is that many smaller table scans take place under the hood (transparent to the user). From the user's standpoint, it is simply a much faster table scan. Figure 19.3 shows a parallel query.
A table scan performed with the use of the Parallel Query option.
The query coordinator is given a SQL statement and a degree of parallelism, and is responsible for dividing the query among the query servers and integrating the individual results into one result. The degree of parallelism is the number of query servers assigned to the particular query.
The Oracle server can make parallel the following operations:
Each of these operations has requirements that determine how the query is parallelized. The performance achieved by the parallel query is determined both by the size of the data to be accessed and the degree of parallelism achieved.
The query coordinator determines how the query is parallelized (if at all). The decision is made in this order:
2. The query coordinator determines which operations can be performed in parallel.
3. The query coordinator determines how many query servers to enlist.
4. The query coordinator enlists query servers that perform the query.
5. The query coordinator reassembles the resulting data and passes it back to the user.
The degree of parallelism is determined using the following precedence:
2. Table definition--The default degree of parallelism as defined for the table has second precedence.
3. Initialization parameters--Finally, the Oracle initialization parameters are used.
Regardless of what these values are set to, the number of query servers cannot exceed the number of query servers available in the query-server pool. This number is specified by the Oracle initialization parameter PARALLEL_MAX_SERVERS.
Hints for the degree of parallelism are set within a comment string in the SQL statement. The syntax of this comment is as follows:
PARALLEL (alias_or_tablename , [integer/DEFAULT] [, integer/DEFAULT])
The PARALLEL hint specifies the table or alias being scanned, followed by a value for the number of query servers to be used (or DEFAULT). The final optional value specifies how the table is to be split among different instances of a parallel server. Here is an example using the dogs table that you have seen throughout this book:
SELECT /*+ FULL(dogs) PARALLEL(dogs, 4) */ dogname FROM dogs;
Because of the addition of the FULL and PARALLEL hints to this statement, the Oracle optimizer will create an execution plan that uses a full-table scan. Furthermore, this table scan will be executed with a parallel degree of 4 if the query servers are available. This statement overrides both the degree of parallelism specified in the table definition and the default Oracle initialization parameters.
The hint NOPARALLEL disables parallel scanning of a table and overrides the specified degree of parallelism. The NOPARALLEL hint has the following syntax:
Parallel query operations can be very effective on multiprocessor or parallel-processing computers; they can also be effective on uniprocessor systems where much of the time is spent waiting for I/O operations to complete. Systems with sufficient I/O bandwidth--and especially systems with disk arrays--benefit from parallel query operations.
If your system is typically processing at 100% of your CPU utilization and you have a small number of disk drives, you probably won't benefit from parallel query operations. The same is true if your system is extremely limited in memory.
The two areas that can be tuned for parallel queries are I/O and parallel servers. By properly configuring your datafiles, you can help parallel queries be more effective.
The function of a parallel query is to split up query operations so that they more effectively take advantage of the system. One of the ways a parallel query does this is by allowing the processing of the query to continue while pieces of the query operation are stalled, waiting for I/Os to complete. Parallel queries are not effective if the entire table is limited to one disk drive.
If you stripe the table across many drives, I/Os can be distributed and a higher level of parallelism can occur. Striping can be performed with OS striping, with Oracle striping, or (better yet) with a hardware disk array. See Day 20, "Effectively Tuning and Optimizing the Database," for more detailed information about tuning I/O.
Large contiguous extents can also help performance in parallel query operations. During scan operations, the query coordinator splits contiguous ranges of blocks into large, medium, and small groups of blocks. Each query server is given a large group of blocks to start with, and each server progressively works its way down to the small group of blocks until the scan is completed. This is done in an attempt to balance the load handled by each query server. If several large extents are in a table, the query coordinator can find blocks to dispatch to the query servers much more easily.
TIP: Remember, your temporary tablespace should consist of several large extents on a striped volume. This arrangement helps sorting performance.
Degree of Parallelism
The proper distribution of I/Os and the degree of parallelism are the two most important things to tune in the Parallel Query option. Tuning the degree of parallelism is partially trial and error and partially analysis. It is very important to take notes when you are experimenting with the degree of parallelism. Your first guess should be based on the following factors:
All these parameters should have some influence on the degree of parallelism you set for your system. Remember that the preceding points are simply guidelines to help you guess a starting point. Here are a few other suggestions:
Once you determine your starting point, you can monitor your system by querying
the dynamic performance table, V$PQ_SYSSTAT. This can be performed with
the query shown in Listing 19.1.
SQL> select * from v$pq_sysstat;
STATISTIC VALUE ------------------------------ ---------- Servers Busy 0 Servers Idle 12 Servers Highwater 16 Server Sessions 380 Servers Started 4 Servers Shutdown 4 Servers Cleaned Up 0 Queries Initiated 21 DFO Trees 77 Local Msgs Sent 2459361 Distr Msgs Sent 0 Local Msgs Recv'd 2459318 Distr Msgs Recv'd 0 13 rows selected.
ANLYSIS: When looking at the output from this query, the following statistics are quite useful:
After you determine your degree of parallelism, begin testing; evaluate the information you get from V$PQ_SYSSTAT and from your operating system's monitoring facilities. Keep an eye out for CPU usage and excessive waiting for I/O. If the CPU usage is too high, try reducing the degree of parallelism. If the CPU usage is too low and there is significant waiting for I/O, try increasing the degree of parallelism.
Remember that the degree of parallelism is determined by SQL hints, table definitions, and initialization parameters. The total number of query servers is determined by the initialization parameter PARALLEL_MAX_SERVERS; the number started initially is determined by the initialization parameter PARALLEL_MIN_SERVERS.
The total number of query servers in use is the number of queries executed in parallel multiplied by their degree of parallelism. If you try to use more than PARALLEL_MAX_SERVERS, you will not be able to parallelize your query.
You can use the Direct Write Sort option with the Parallel Query option and have each query server perform its own direct writes.
As you saw earlier, using direct writes causes the server processes to write the output of sort operations directly to disk, bypassing the buffer cache. The effect of direct writes is that for sort operations, large amounts of block buffers are not ejected from the buffer cache. This leaves the buffer cache available for normal queries and updates. When you use direct-write sorts with the Parallel Query option, each query server gets its own set of direct-write buffers.
Remember, direct-write sorts require more memory than normal sorts. The amount of memory that these sorts use with the Parallel Query option can be determined with the following formula:
Use direct-write sorts only if you have sufficient memory and temporary disk space. The temporary disk space should have a sufficient I/O bandwidth to handle the load.
Another feature of the Parallel Query option is its ability to create indexes in parallel. With the parallel index creation feature, the time it takes to create an index can be greatly reduced.
As with parallel query processing, a coordinator process dispatches two sets of query servers. One set of query servers scans the table to be indexed to obtain the ROWIDs and column values needed for the index. Another set performs the sorting on those values and passes the results to the coordinator process. The coordinator process then assembles the B*-tree index from these sorted items.
When creating an index, the degree of parallelism follows the same precedence as it does in parallel query processing. The first value used is an optional PARALLEL clause in the CREATE INDEX statement, followed by the table definition, and finally the initialization parameters.
Creating an index in parallel can be several times faster than creating an index by normal means. The same conditions apply for index creation as were given for parallel query processing. A system that has been configured to take advantage of parallel query processing will also see good performance from parallel index creation.
You can load in parallel by having multiple concurrent sessions perform a direct path load into the same table. Depending on the configuration of the system, you can see excellent load performance by loading in parallel. Because loading is both CPU and I/O intensive, you should see good results in an SMP or MPP environment with a high-bandwidth I/O subsystem.
Parallel loads are performed by multiple direct loader processes, each using the PARALLEL=TRUE and DIRECT=TRUE options. When you specify PARALLEL=TRUE, the loader does not place an exclusive lock on the table being loaded as it would otherwise. During the parallel load, the loader creates temporary segments for each of the concurrent processes and merges them on completion.
Although parallel loading performs best when each temporary file is located on a separate disk, the increased performance of the load does not usually justify the complexity of the manual striping needed to do this. I still recommend striping the tables on an OS level--or preferably on a hardware disk. You can improve performance by putting each of the input files on a separate volume to take advantage of the sequential nature of the reads.
Parallel loading can be beneficial, especially if load time is critical in your environment. By putting each of the input files on separate disk volumes, you can increase performance. Overall, the general tuning principles used in parallel query processing are also valid in parallel loading.
Parallel recovery is probably my favorite feature of the Parallel Query option. When benchmarking Oracle and testing hardware and software, it is often necessary to intentionally crash the system to prove recoverability. With the Parallel Recovery option, the time it takes to perform an instance recovery can be dramatically reduced.
Recovery time is significantly reduced when the system being recovered has many disks and supports asynchronous I/O. For a small system that has few drives or for an operating system that does not support asynchronous I/O, it might not be wise to enable parallel recovery.
In traditional recovery, one process reads from the redo log files and applies changes to the datafiles, as shown in Figure 19.4. This operation can take a significant amount of time because the recovery process must wait for disk I/Os to complete.
Traditional instance recovery without parallel query.
With the Parallel Recovery option, one process is responsible for reading and dispatching redo entries from the redo log files and passing those entries to the recovery processes that apply the changes to the datafiles, as shown in Figure 19.5.
Because the dispatcher process reads sequentially from the redo log files, the I/O performance is much higher than that of the recovery processes that are writing random data throughout the datafiles. Because writing the data is very seek intensive, it is a good idea to have one or two recovery processes for each data disk in the system.
By having more recovery processes, you can have more outstanding I/Os and thus use all the data drives simultaneously. Because recovery occurs at instance startup, this arrangement reduces dead time when no other database processing can be performed.
Instance recovery with parallel recovery.
The number of concurrent recovery processes is set with the initialization parameter RECOVERY_PARALLEL. The value of this parameter cannot exceed the value specified in the initialization parameter PARALLEL_MAX_SERVERS.
If you specify a sufficient number of recovery servers, you will see an immediate improvement in instance recovery time. Do not use parallel recovery if your system does not support asynchronous I/O or if you are limited to a few disk drives. If your I/O subsystem is high bandwidth and your data is properly striped (either through software or hardware), you should see very good improvement.
In summary, the Parallel Query option is useful for distributing processing loads so that CPUs process while other processes wait for I/Os to complete. With multiprocessor machines, the Parallel Query option can be quite beneficial; this is not to say that the option is not beneficial on uniprocessor machines as well.
NOTE: Probably the biggest performance problem I have come across is a lack of disk drives. As larger and larger disks are produced at lower and lower prices, many installations end up with I/O problems because a few large disks can provide the space of many smaller disks. The larger disks provide more disk space but not much more performance (per disk). The Parallel Query option can help only in systems where I/O is not a bottleneck. When I/O is not a problem, you will see significant gains from parallel queries.
If you have processes waiting for queries to complete and a sufficient number of disk drives, you will see an improvement with parallel queries, regardless of whether you are on a multiprocessor or uniprocessor system.
Today you learned about several features available to Oracle, including the Oracle Parallel Query and Oracle Parallel Server options. These features are quite different from each other and serve completely different purposes.
The Oracle Parallel Server option (OPS), a clustering option, can be used as a performance option and as a fault-tolerant option. Because multiple systems can access the same database, a significant performance enhancement can be seen in some situations.
NOTE: Under some conditions, the Oracle Parallel Server option can provide a significant performance increase. The performance improvement you see is very dependent on your application and data layout. Without careful planning, you might not see any performance improvement, and you might even see performance degradation.
The Oracle Parallel Query option allows certain SQL operations to be parallelized, thus reducing idle time waiting for I/Os to complete. If you take advantage of the Parallel Query option, you can see very good performance enhancements under the right conditions. The Parallel Query option is probably my favorite feature of the Oracle RDBMS because such incredible results can be seen from the right parallelism.
In tomorrow's lesson, "Effectively Tuning and Optimizing the Database," you will learn to tune the Oracle RDBMS, and how to use OS and RDBMS information to make changes that will improve your system's performance. You will learn how to use the Oracle administration scripts UTLBSTAT.SQL and UTLESTAT.SQL to obtain Oracle performance information, and you will learn how to interpret some of that information.
A The Oracle Parallel Server option is a feature of Oracle that allows you to cluster two or more systems sharing the same database.
Q Does the Parallel Server option require any special hardware?
A Yes. The Parallel Server option requires a server interconnect that might be as simple as a fast network card and a shared-disk subsystem. The shared-disk subsystem is provided by your hardware vendor.
Q What is the Parallel Query option?
A The Parallel Query option allows certain SQL operations to be parallelized. If you parallelize the operation, time spent waiting for I/O by one thread of operation can be spent processing another thread of operation.
Q Do single-processor systems benefit from the Parallel Query option?
A Definitely. Because the greatest benefit of the Parallel Query option is to keep the system busy while it is waiting for I/Os to complete, this will work well even with a single-CPU system.
The workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to provide you with experience in using what you've learned. Find answers to the quiz questions in Appendix A, "Answers."
2. What is the server interconnect for?
3. What is a shared-disk subsystem?
4. What does DLM stand for?
5. What does PCM stand for?
6. What is the Parallel Query option used for?
7. How does the Parallel Query option improve performance?
8. Name the primary way to set the degree of parallelism.
9. Name the secondary way to set the degree of parallelism.
10. Name some operations that can be parallelized.
2. Run the same query using a parallel degree of 2 set in a hint. Compare the time.
3. Run the same query using a parallel degree of 4 set in a hint. Compare the time.
4. Run the same query using a parallel degree of 8 set in a hint. Compare the time.
© Copyright, Macmillan Computer Publishing. All rights reserved.