In the next three days, you will learn about some of my favorite topics: backup and recovery and ways to engineer maximum uptime. This is one of my favorites because of the various ways you can design backup and recovery strategies and because of the enormous importance of this task.
Before you learn how backups work, I would like to review a little bit about how Oracle works. This section does not cover all the Oracle functions and processes, just those pertinent to the integrity of the data in the database.
If none of the data in the database ever changed, there would never be a need to back up that data more than once. Because the data never changed, any backup that occurred in the past could be used to restore the data to a useful state. Well, this is not the case in most systems, although by using read-only tablespaces it can be achieved.
Because most systems' data changes, the database is now a constantly changing system. It is of utmost importance that this data be protected. If a system failure should occur, it is unreasonable to think that all changes that are in progress should be recovered, but it is a reasonable expectation that all committed transactions be recovered.
New Term: In fact, if you have a good backup-and-recovery plan, no committed transactions should ever be lost. In recovering the database from a failure, you have to determine which of the two types of failure occurred so you know which recovery mechanism to use. The two types of recovery are instance recovery and data recovery.
New Term: An instance failure can occur due to a system failure, such as a power outage or a hardware or software problem. When an instance failure occurs, the Oracle recovery process can completely recover the database on instance startup. All committed transactions will be recovered or rolled forward, and all noncommitted transactions will be rolled back.
NOTE: Instance recovery can be quite time consuming. How much time instance recovery takes depends on the number of dirty buffers in the SGA, and the number of dirty buffers depends on the amount of time since the last checkpoint. The time it takes for instance recovery also depends on the recovery parallelism. The recovery parallelism is the amount of parallelism used in the recovery process; this is part of the Parallel Query option's Parallel Recovery feature.
As you will see later today and tomorrow, instance recovery can and should be tuned to provide for the shortest possible recovery time.
TIP: By tuning the checkpoint interval and recovery parallelism, you can shorten the recovery time.
Data recovery is necessary when media failure has occurred--for example, when a datafile is damaged. If a datafile is damaged, you must first restore the damaged file from backup before recovery can occur. The Oracle recovery process will then apply archive log files and redo log files to restore the damaged datafile to the state that it was at prior to the failure. It may take as many as all the archive log files created since the backup to recover the restored datafile. This process can be quite time consuming.
TIP: By scheduling frequent backups, you can shorten the restoration and recovery time.
As you will see today and tomorrow, the goal is to have the shortest possible downtime in the event of a catastrophic system failure.
You cannot do anything to recover a lost log file. If a log file is damaged and the system is still functional, you can drop the log file, re-create it, and immediately perform a full backup. (The backup is necessary because you will not be able to use the archive log files and redo log files for any recovery on this database.) This is why I always recommend that log volumes be fault tolerant.
Remember, the term transaction is used to describe a logical group of work. This group of work can consist of one or many SQL statements and must end with a commit or a rollback. This definition assumes a client/server application, so SQL*Net is necessary. To review the steps required to execute a transaction, review Day 2, "Exploring the Oracle Architecture."
For today's lesson it is important to note that the transaction requires the redo log entry to be written out before the transaction has been committed. If a failure occurs before that entry has been written, the transaction will not be recovered.
As mentioned on Day 2, the redo log records all changes made to the Oracle database. The purpose of the redo log is to ensure that, in the event of the loss of a datafile caused by some sort of system failure, the database can be recovered. By restoring the datafiles back to a known good state from backups, the redo log files (including the archive log files) can replay all the transactions to the restored datafile, thus recovering the database to the point of failure.
When a redo log file is filled in normal operation, a log switch occurs and the LGWR process starts writing to a different redo log file. When this switch occurs, the ARCH process copies the filled redo log file to an archive log file. When this archive is complete, the redo log file is marked as available. It is critical that this archive log file be safely stored, because it might be needed for recovery.
NOTE: Remember that a transaction has not been committed until the redo log file has been written. Slow I/Os to the redo log files can slow down the entire system.
The Oracle RDBMS is not required to run in ARCHIVELOG mode, so the archiving mechanism can be disabled. However, if you do not run in ARCHIVELOG mode, it is impossible to recover any transactions that are not in the current redo log files.
WARNING: If you do not run in ARCHIVELOG mode, you will most likely lose valuable data in the event that a datafile is lost. I recommend that all production systems be run in ARCHIVELOG mode. You can run test systems without archiving.
If your data is important to you, you should always run your production systems in ARCHIVELOG mode.
Archiving is enabled by running the ALTER DATABASE command with the parameter ARCHIVELOG. To enable archiving, use
ALTER DATABASE ARCHIVELOG;
Archiving is disabled by running the ALTER DATABASE command with the parameter NOARCHIVELOG. To disable archiving, use
ALTER DATABASE NOARCHIVELOG;
If you are in ARCHIVELOG mode and want to disable it for a particular startup, you can set the Oracle parameter LOG_ARCHIVE_START=FALSE.
Setting the Archiving Destination
Use the Oracle parameter LOG_ARCHIVE_DEST to set the destination for Oracle archiving. Archive log files will be created in this directory with a different filename for each archive log file. This filename is defined by the Oracle parameter LOG_ARCHIVE_FORMAT. Both of these parameters are described in the Oracle documentation.
In configuring your hardware system, it is important to plan for protecting your data. By carefully configuring your system for data integrity, you can optimally configure for both performance and protection. In this section, I give you a few tips on how to configure your system using RAID disk subsystems.
New Term: A disk array is collection of disk drives configured to act as one larger disk drive. Both hardware and software disk arrays are available today. Hardware disk arrays consist of a disk array controller and a set of disk drives (typically SCSI). Software disk arrays are made up of a software layer that lies between the file system and the device driver layers.
Disk arrays usually support disk striping, disk mirroring, disk parity, and so on. Hardware disk arrays also support other features such as hot-swappable disks. An array with fault tolerance and hot-swappable disks allows you to replace a defective disk while the system is active. Depending on the manufacturer and model, the disk array controller can support several different types of RAID fault tolerance.
Disk arrays offer many benefits to the system administrator and the end users:
New Term: A disk array is a set of disk drives that make up a larger logical disk, which is sometimes called a logical volume. The logical volume is made up of identical-sized pieces of the individual drives, called stripes. The data is said to be striped across the logical volume because the logical drive has pieces of all the individual drives striped within it, as shown in Figure 16.1. If you look at the logical volume, the physical drives seem to make stripes.
The stripes are all the same size and in sequence (that is, disk 1 holds the first stripe, disk 2 the second, and so on). The size of the stripe varies based on the manufacturer and model of the disk array controller. The stripe size also varies in a software array. The size of the stripe is called the striping factor and is given in number of blocks. A typical striping factor is 32. Given a size of 512 bytes per block, a striping factor of 32 makes the disk stripe 16KB in size.
Disk stripes on a logical volume.
As far as the user and the RDBMS are concerned, there is only one large disk drive. It is up to the OS and the hardware to manage the individual disks within the array.
The configuration of disks in an array is sometimes called a Redundant Array of Inexpensive Disks (RAID) configuration. RAID technology allows systems to maintain very large amounts of storage at relatively low cost. The inexpensive SCSI drives that make up the array have not only improved in performance and quality over the years, but they have significantly dropped in price.
New Term: The term RAID is also used to describe the type of striping you use. Striping methods vary in both performance and space overhead. The type of configuration used in your system depends on your individual needs. Different configurations of striping and fault-tolerant striping are identified by RAID levels.
The various RAID levels are determined by a body of computer vendors called the RAID Advisory Board. This standards body determines the basics of the various RAID levels but does not force any particular implementation, thus allowing vendors to differentiate themselves. Following are brief descriptions of the various RAID levels.
RAID-0 is the base RAID level and is used to describe disk striping with no fault Figure 16.2.
RAID level 0 has no fault tolerance.
tolerance. RAID-0 drives simply have data striped over all the drives, as shown in Figure 16.2.
RAID-0 is the highest-performing and most economical of the RAID levels. The amount of data in the logical volume is equal to the sum of the amount of data on each disk drive. RAID-0 causes no additional I/O to be generated on behalf of the fault-tolerant method.
The downside of RAID-0 is that if a disk were to fail, the entire volume would become invalid. Because the data in the logical volume is striped across all the disks, the loss of a single disk causes a loss of data throughout the logical volume. If a 14-disk volume fails, you must restore the data for all 14 disk drives. There is no way to back up and restore data for a single drive in a disk array.
If you are looking for the highest performance and the best value possible and are not worried about fault tolerance, this is the RAID level you should use.
The RAID-1 level is also known as disk mirroring. In RAID-1, all the data stored on a disk drive is duplicated on another disk in the array. Each time a write occurs to the logical disk, the data must be written to both physical disks before the logical write is considered completed. With disk mirroring, a single disk is mirrored to another disk; these disks can also be striped with other disks to form a larger logical volume, as shown in Figure 16.3.
RAID level 1 is also called disk mirroring.
Because the mirroring is on a one-to-one basis, you can actually lose half the disks in your system (depending on which disks they are) and still be operational. With most disk array controllers, you can split the mirror across SCSI buses. This arrangement allows for the failure of an entire SCSI bus (for example, a cabinet failure) without affecting operation.
With disk mirroring, you can use only half the disks in the system (the other half are used for the mirrors). In other words, if you use disk mirroring with two, 2.1GB disks, you can use only 2.1GB of space. When writing, you get the benefit of only one disk in terms of performance, because a logical write invokes two physical writes.
Under certain conditions, you will see a benefit from reading from mirrored drives. Some disk array controllers support split-reads, in which reads can occur simultaneously on different mirrored drives to different data. The disk with the heads closest to the requested data retrieves the data. Depending on your data-access methods, this feature might or might not provide any benefits to you.
If you can afford the cost of disk mirroring, RAID-1 is the best choice when fault tolerance is required. With disk mirroring, you achieve the highest level of protection as well as the fastest disk access possible for a fault-tolerant volume.
RAID level 2 is data striping that is protected by a method known as hamming code that is similar to the method used in error detection and correction in RAM. Because of the difficulty in implementing RAID-2, it has not been widely accepted.
RAID level 3 uses a parallel-access parity RAID configuration. RAID-3 stores data and parity on a several disks using a bit-by-bit parity. RAID-3 uses a parallel access method to achieve high performance but requires that the individual disk rotations be synchronized, which may be difficult. RAID-3 is not widely used.
New Term: RAID level 4 is known as drive parity, or data guarding. In RAID-4, one of four drives is used for data parity. If any one of the four disks fails, the other three continue running. Because the algorithms used in RAID-4 are considered obsolete, RAID-4 is not used much. RAID-4 has essentially been replaced by RAID-5.
When you use RAID-4, a lot of I/O is generated. To calculate the parity, RAID-4 reads data from all the drives in the set. As you can imagine, the overhead involved with RAID-4 is very high.
New Term: RAID-5 is also known as distributed data guarding. In distributed data guarding, enough information is stored about each drive so that any one drive in the set can fail; the data is restored by using all the other drives. The space you get from RAID-5 is equal to the following:
Drive Space = (N-1) * Drive Size
In this equation, N is the number of drives in the logical volume. In other words, if you are running 12 disk drives with RAID-5, you get the space of 11 disk drives.
As with RAID-4, there is a penalty associated with RAID-5. For each write to disk, two reads take place, a calculation of the parity is done, and then two writes are done. Although a read generates only one I/O, a write generates four I/Os.
RAID-6 is almost identical to RAID-5 but with the addition of an independently computed data check. This makes the RAID-6 volume very secure, with extremely high data availability because of the addition of one extra disk for this additional fault tolerance. Thus, with RAID-6 the cost is
Drive Space = (N-2) * Drive Size
Other RAID Levels
You might see other RAID levels mentioned that are a combination of these RAID levels. Some vendors call a combination of RAID-0 (striping) and RAID-1 (mirroring) a RAID-10 or RAID-0+1 array.
New Term: Write caches are designed to cache the writes written to the controller and to post the I/O request to the disk driver. Posting the request means that the device driver believes the I/O is completed. Posting the write allows the overhead associated with queuing the request and writing it to the individual disk to be masked. All the OS and the RDBMS see is an incredibly fast write and can continue with their operations as though the data had been completely written out to disk.
WARNING: Unless the write cache is protected, it is not usually recommended that write-caching be enabled on the redo log volume. If the RDBMS believes a write to have occurred to the redo log, and a controller failure occurs before that data has actually been written to the disk drives, you might not be able to recover the instance.
Some disk array controllers have incorporated both mirroring or parity and a battery backup into the write cache to ensure the data is protected even in the event of a power outage. Use your own judgment about running a write cache on the redo log volumes.
If you use RAID technology, there are essentially three modes in which you can run your database:
Although disk drives have become much more reliable in the past few years, they are still the component most likely to fail. Disk drives are susceptible to failure because they are mostly mechanical devices.
Disks are made up of sophisticated electronics and motors. Forces such as heat, dust, and friction can cause even the most well-built disk drive to fail. No matter how well you take care of your system, it is inevitable that you will one day see a disk failure.
TIP: Depending on the size and critical nature of your database, a failure of one disk can be devastating if you are not protected. When taking into account your budget for fault-tolerant components, keep in mind the cost of extended downtime that results from a failure.
In No Data Protection mode, no fault tolerance is used and the data is completely unprotected. Any disk failure causes you to reload all the affected files from the last backup.
If the failure occurs on a data volume, you can restore the data (if you are running in ARCHIVELOG mode, you can use the archive log files to restore the database up to the point of failure). Even so, when a disk drive fails, you must restore the entire volume, which might mean tens of gigabytes of data.
If the failure occurs on the volume containing your redo log files, you can recover only up to the last good archived log file. At instance recovery, you can recover only up to the point at which the LGWR started writing to the damaged log file(s).
If the failure occurs on the volume containing the OS and Oracle binaries, you must restore from a backup or reinstall the operating system, reload Oracle, and then restart Oracle.
As you can see, in each of these unprotected situations, the recovery interval can be quite long. No failure of an unprotected system is easy to recover from.
Full Data Protection mode is by far the most secure option. If you have fault tolerance on every disk drive in the system, any single failure does not cause a system failure. Depending on the level of fault tolerance you run, many multidisk failures can be tolerated. The level of fault tolerance you run depends on the performance you require and the budget you have.
In the last few years, the cost of disk drives has dropped so dramatically that it is not unusual to see entire systems run with disk mirroring. The cost of the extra drives needed to support fault tolerance usually outweighs the cost incurred by extended downtime.
In assessing the level of fault tolerance, look at your required I/O rates. If possible, use RAID-1 because it gives the best performance. If RAID-1 is not in your budget, perhaps you can use it for at least your OS and redo log volumes. (The redo log volumes usually demand the highest protection available.)
Another option for protecting your data is to apply fault tolerance to specific pieces of critical data. This might be the best option if you have a limited budget and require high performance.
Critical areas that should be protected are the OS, the redo logs, and the archive logs. If these areas are protected and a drive volume fails, you will not have to spend extra time restoring or reloading the operating system. If the redo log volumes are not protected and a drive fails, you might not be able to recover your database to the point of failure. If the archive files are not protected and a drive fails, you will not be able to recover from a data loss failure if that archive log file is needed for recovery.
By using partial fault tolerance, you might be able to reduce downtime in the event of a failure and still keep within your budget. Protecting areas such as the OS volumes and redo log volumes will help you reduce the amount of time needed to recover.
Fault Tolerance Summary
Depending on your needs, you might not have to completely protect all your data, but some fault tolerance is recommended. If it is too expensive to protect all your disks, protect at least the critical areas. Depending on your needs, you might want to use different data-protection modes on different volumes. Here are a few suggestions about fault tolerance for you to consider:
I would recommend that you employ some sort of fault tolerance on your system. At the very least, use mirroring on your OS volume as well as on the Oracle binaries, control files, and redo log files. This will allow you to reduce the amount of downtime that you would suffer in the event of a system failure. The worst-case scenario would be to have to reinstall an operating system, reinstall Oracle, rebuild the database, and then restore. This scenario calls for maximum downtime. By carefully configuring your system, downtime can be reduced.
There are a variety of options available for backing up the Oracle database. Each option provides a different amount of protection and typically causes a different amount of downtime. The options are as follows:
In this section you will learn how these backup types differ, which you need to run, and how they complement each other.
Because archive log files are continually being created, they are typically backed up to offline storage. Because the archive log information can be quite large, it is usually not feasible to keep this data online, but if possible you should keep the archive log files online until a complete database backup is performed.
Remember, once a datafile is restored from backup, the archive log files that were created since that backup was taken are applied to roll-forward transactions. Archive log files older than the last backup are needed only if the restore operation fails and an older backup needs to be used.
TIP: A good procedure would be to save archive log files until the next datafile backup, and then archive them to tape or other archival storage.
While online, these archive log files need to be protected. I recommend keeping your online archive log files on a RAID-protected disk volume or backing up the archive log files to tape as soon as they are created. The loss of an archive log file could mean that you can't recover your database in the event of a failure.
Another way to protect your archive log file is to use the Oracle archive log duplexing feature. If you set the Oracle parameter LOG_ARCHIVE_DUPLEX_DEST, a second copy of each archive log file will be created in this directory. This protects the archive log files against any single point of failure.
The control files are very important to the operation of the Oracle RDBMS because they contain information about the physical construction of the database. If the control file is lost, it might still be possible to re-create it, but it will be very difficult.
On Day 8, "Administering Redo Logs, Control Files, and Rollback Segments," you learned how to back up control files using the ALTER DATABASE command. The two options for the ALTER DATABASE database BACKUP CONTROLFILE command are as follows:
Another option is to simply back up the control file(s) when performing operating-system backups. If a control file is lost, it can be restored from that backup. The control file(s) needs to be backed up any time there is a structural change to the database, because such changes alter the control file.
A full backup occurs when the entire database and its associated control files are backed up. A full backup can occur either offline or online. The best type of backup to perform is the full offline backup because you get a consistent view of the entire database, but for many systems this is not an option because little or no downtime is allotted for backups.
The full offline backup is sometimes called a consistent whole database backup. The full offline backup is done when the database instance has been shut down cleanly. With the database shut down, you back up the datafiles and control files using OS or third-party backup software and hardware. If you have the downtime available to you, the full offline backup is the best backup solution.
NOTE: Even though you have backed up the datafiles and the control files, you should only restore the control files if necessary. Never overwrite a newer control file with an older one unless there is no other alternative.
In many cases where systems need to be up and running 7 days a week, 24 hours a day, the offline backup is not an option. In this situation, the online backup is your only choice.
A full online backup is really just a combination of tablespace backups and control file backups. With the full online backup, the control files and, in turn, each tablespace are backed up using the online tablespace backup facility.
NOTE: Online backups are valid only if you are running in ARCHIVELOG mode. If you are not running in ARCHIVELOG mode, your only choice is an offline backup.
The tablespace backup allows you to back up a single tablespace while the database is online and running. When the administrator signals the Oracle instance that a backup will be performed on that tablespace, special actions are taken. Once the tablespace has been marked for backup, one or more of the datafiles that make up the tablespace are backed up using OS or third-party utilities.
When you mark the tablespace for backup, the Oracle RDBMS will know that this online backup has occurred, and it will know how to recover the database if recovery is necessary. In the next section you will learn the actual mechanics of how to perform this type of backup.
With Oracle8 there are several different ways that the Oracle database can be backed up. You can use any of the following:
NOTE: The new Oracle8 utility for backup and recovery is called Recovery Manager. The graphical utility provided with Enterprise Manager that uses Recovery Manager is called Backup Manager. You will see references to both Backup Manager and Recovery Manager. Keep in mind that both are essentially the same utility.
Each of these utilities can perform an effective backup. It is usually your own personal preference that can help you decide which one to use.
Backup Manager is invoked and runs through Enterprise Manager, and therefore can be run either locally or remotely. Backup Manager is a graphical utility and fairly easy to use.
Before Backup Manager can be used, the recovery catalog must be created. This catalog can be created by using the following SQL statements:
SPOOL recovery.log CREATE USER rman IDENTIFIED BY rman TEMPORARY TABLESPACE temporary DEFAULT TABLESPACE rcvcat QUOTA unlimited ON rcvcat; GRANT recovery_catalog_owner TO rman; CONNECT rman/rman @d:\orant\rdbms80\admin\catrman
NOTE: The recovery catalog should be created in its own database. If the database you are backing up is damaged and contains the recovery catalog, the catalog will also be damaged.
The next step is to set up Oracle for remote administration so that Backup Manager can connect in. The following steps must be taken to allow this to happen:
2. Save your existing password file (if you have one). It will be named orawpSID (where SID is your SID name).
3. Create a new password file with the following syntax:
ORAPWD80 file=orapwSID password=password entries=10
CREATE USER remote IDENTIFIED BY remote
GRANT connect, resource TO remote; GRANT sysdba TO remote;
After you have created the remote user, you must set up Backup Manager to use these accounts. The first step is to start up Backup Manager. From Enterprise Manager, select the Oracle Backup Manager. The first screen you will see is the Backup Manager--Subsystem screen, as shown in Figure 16.4.
The Backup Manager--Subsystem screen allows you to choose which method of backup you prefer.
From this screen, choose Oracle8 Recovery Manager. This will bring up the Oracle Recovery Manager's graphical interface, known as Backup Manager (see Figure 16.5). This is Backup Manager's main screen. You might notice that this screen is similar to some of the other Enterprise Manager applications. Because of the nature of the backup-and-recovery process, this screen allows you to browse many of the Oracle schema objects, such as
Backup Manager also allows you to change the state of the instance. This is useful if you want to shut down the instance in order to perform an offline backup.
After you have invoked Backup Manager, you must change the database connection to use the remote user you have just set up. First, select Change Database Connection from the File menu.This will invoke the Target Database Login screen, shown in Figure 16.6. From here you should change the login to select the username, password, and service necessary to connect into the database you will be backing up.
The main screen of Backup Manager.
Change the login in the Target Database Login screen to connect to the database you will be backing up.
Be sure that you have selected to connect as SYSDBA.
It is also necessary to select the connection to the recovery catalog that you created earlier in this section. To do this, select the Catalog Connect String option from the Catalog menu. This will invoke the Recovery Catalog Login screen, which is very similar to its counterpart in Figure 16.6. This is where you select the username, password, and service for the recovery catalog database. Remember that in a production environment, the recovery catalog should be in a database other than the one you're backing up.
Once you have set up the connections for the database and the catalog, you should register the recovery catalog. You do this by selecting Register from the Catalog menu, which will invoke a job to register the recovery catalog. After you have registered the recovery catalog, you are ready to use Oracle Backup Manager.
To invoke Backup Manager, use the procedures described in Figures 16.4 and 16.5. After you have invoked the Backup Manager, you should select the Backup Wizard option from the Backup menu.
If this is the first time you have run the Backup Manager, you will be informed that you do not have any channels defined, as shown in Figure 16.7.
You see this message if this is the first time you have used Backup Manager.
The channels are actually I/O channels. The more channels you have defined, the more parallelism you will achieve. In my case, because I am running on a small, single-processor system, I will create only one channel. Give the backup channel a name and a destination if you want the data to be backed up to disk. (See Figure 16.8.)
The Create Channel screen is used to create a backup channel.
After you have created at least one channel, you can proceed with the Backup wizard. The first screen in the Backup wizard is the introduction (see Figure 16.9). Here you select what type of backup you want to perform. The options are
You can select from several options in the Introduction screen of the Backup wizard.
In the figure, I have selected to perform a database backup. The appearance of the next screen varies based on the selection of the backup type. If you have chosen Database, you will be asked whether you want to back up the archived logs. If you have chosen Tablespace, you will be presented with a list of tablespaces to select from, and if you have chosen Datafiles, you will be presented with a list of datafiles to choose from. If you have chosen Archived logs Only, you will be given a range of archived logs to back up.
For this illustration, the Database option has been selected. The Backup wizard then proceeds to the next screen, where you will be asked whether you want to back up the archived logs. I prefer to handle those separately, so I chose to back up none of the archived logs. This is shown in Figure 16.10.
In the Archived logs screen of the Backup wizard, you choose whether you want to back up any of the archive log files.
In the next screen of the Backup wizard, you will be prompted to select one or more channels to back up to. Using more than one channel will increase the parallelism of the backup. The Channels screen is shown in Figure 16.11.
Here you choose the channels to which you want to back up.
The next screen gives you some options that should be set. You should set the tag to a meaningful name for the backup. If you are doing an incremental backup, you can set the level of the backup here. This is shown in Figure 16.12.
Backup wizard--Backup Options. Here you select a tag to identify the backup set.
The next screen is the Parameters screen. Here you can set the maximum files per backup set, as shown in Figure 16.13. By setting this number, you force multiple channels to be used. The more files per backup set, the more channels that will be used.
The Backup wizard--Parameters screen allows you to set the maximum number of files per backup set.
The next screen is the Schedule screen, which is very similar to the one you saw in yesterday's lesson on job scheduling. You can set the backup to begin immediately or at some time in the future, as shown in Figure 16.14.
In the Schedule screen of the Backup wizard, you specify whether you want the backup to begin immediately or at some later time.
The last configuration screen for the Backup wizard is the Save/Submit screen. With this screen, you can select to submit the backup, to save these parameters to the library, or both, as shown in Figure 16.15. Typically you will submit at this point.
The Backup wizard--Save/Submit screen allows you to either submit the job or save the settings for later use.
After you have completed all the screens of the Backup wizard and clicked the Finish button, you are presented with a summary of your selections, as shown in Figure 16.16.
Click OK to submit the backup job. Depending on the scheduling parameters you have set, the backup might begin to execute immediately. This screen allows you only to view the settings. No changes are allowed. If you find that the settings are in error, you can cancel the job and start the process over again.
The Backup wizard's Summary screen shows you the options you've chosen for your backup.
NOTE: At this point the backup has been submitted and will be completed. You can click the Jobs icon (refer to Figure 16.5) if you want to view the current progress of the job. From the Jobs screen you can also view the status of previous jobs and look at the job library.
Performing a backup with your OS facilities is a very straightforward process. As with Backup Manager, you can perform a full database backup or a tablespace backup. The basic process is different, depending on whether you will be performing an offline backup or an online backup.
The offline backup is perhaps the most straightforward and simplest backup method. You perform an offline database backup by following these steps:
2. Back up all the datafiles, the control files, and the parameter file, using OS or third-party utilities.
3. Restart the database.
This is all there is to performing an offline database backup using OS facilities. You can do this using either OS or third-party facilities. There are a number of very good third-party backup utilities that include bar coding, catalog management, and other features.
The online backup is not actually a full backup because each tablespace is backed up separately. You can think of it as a set of online tablespace backups that make up the entire database. To perform an online tablespace backup, follow this procedure:
ALTER TABLESPACE tsname BEGIN BACKUP;
ALTER TABLESPACE tsname END BACKUP;
When you mark the beginning of the backup, Oracle will divert writes from those datafiles to the SGA until the backup has completed and you have marked the end of the backup. This is all that is necessary to perform the online tablespace backup.
The Export utility can also be used to perform system backups. Because this utility is covered in the lesson on Day 9, "Managing Data," it is not covered here.
Yet another way to back up the Oracle database (if you are running on an NT server) is via the NT Backup utility. This utility comes with the Oracle8 Server for NT and can only be used to back up the local database--you cannot use it to back up databases remotely. The NT Backup utility is very easy to use. It is invoked through the Oracle for NT program group.
Upon invoking the NT Backup Manager, you will be prompted to supply the INTERNAL password.
The internal account will be used to perform the backup. Specifying this password is all that is necessary to connect to the Oracle instance on your system. If you are running in NOARCHIVELOG mode (which is not recommended), you will see the screen shown in Figure 16.17, where a full offline backup is the only option. This is because the full offline backup is the only option for a system running in NOARCHIVELOG mode.
NT Backup Manager with only the option for a full offline backup.
When running in NOARCHIVELOG mode, the only option is to specify the destination for the backup, either by typing it in or by using the Browse option. If you are running in ARCHIVELOG mode (which is recommended because it makes your system more recoverable), you will see the screen shown in Figure 16.18 where a full offline backup, online tablespace backup, and online control file backup are the available options.
NT Backup Manager with multiple options, because the system is running in ARCHIVELOG mode.
Again you have the option to specify the destination for the backup, either by typing it in or by using the Browse button. When you have selected the options you want, simply click the Backup button and the backup will be performed.
The NT Backup Manager does not have the flexibility and performance of the Recovery Manager and its graphical interface, Backup Manager, but for a small database it is the ideal utility because of its simplicity and ease of use.
So far in this lesson, you have seen how to back up your system. Equally as important is determining what to back up and how to schedule those backups. In the next section you will learn how to develop a backup strategy and how to implement it.
Knowing how to back up a system is only one part of the process. Putting together a backup-and-recovery plan or strategy is also very important. Many times you are limited in the time allotted for backups and must be very creative. Remember that your highest priority is to protect the data in your database.
In developing a backup-and-recovery strategy, you must first look at the parameters of your system. Many variables affect the backup strategy and the backup process, including
These factors and others will affect the backup strategy. They must be measured against the main goals of the backup process, which include
By putting together all these factors, you should be able to come up with a backup strategy that meets as many of these requirements as possible.
NOTE: It is unlikely that you will be able to meet all the goals of a backup strategy. Do your best to come up with the best compromises you can.
Scheduling backups is usually not a very easy task. Backups should be taken often enough to protect the system's data from loss, yet should not interfere with normal business operations. Depending on the size of the data that needs to be backed up and the operational schedule of your business, you might choose different options. The options vary, depending on which type of operation your business runs:
Each of these types of operation has different backup needs, depending on the type of data and the frequency of modifications to that data. Let's look at those cases.
Scheduling backups is a little easier in a 5x8 shop because there is plenty of time when the system need not be available to the users and can be backed up without disturbing them. The frequency of the backups in this type of operation can depend in some part on the activity in the database. Here are some guidelines that might help you determine your backup schedule:
|Type of database or operation||When to back up|
|Small database||For a small database, perform a full offline backup every night. This will offer the most protection possible for your data.|
|Large database||If the database is very large, perform a full offline backup every weekend and incremental backups during the week.|
|Active tablespaces||If certain tablespaces are very active, target them for tablespace backups as often as possible.|
|Structural changes||Any time a structural change is made to the database, you should perform a full offline backup.|
|Unrecoverable operations||Any time unrecoverable operations are performed on the database, a full offline backup should be done.|
|RESETLOGS||Anytime that you have to reset the redo logs with the RESETLOGS command, you should perform a full offline backup.|
|Recover until||Any time you have recovered with the Recover until option, you should perform a full offline backup.|
|Archive log files||Every night or every few nights, back up the archive log files. The frequency depends on the number of archive log files created every day.|
Remember, these are just guidelines. Your specific situation will determine how to create an effective backup strategy for your system.
It is much more difficult to plan a backup strategy in a 7x24 shop because there is never a time when the system is not in use. Whenever you perform the backups, there will be one or more people who might be inconvenienced by a performance drop. Remember, backups do tend to affect performance.
The frequency and type of backups will be influenced by the activity of the system.
Here are some guidelines that might help you determine your backup schedule:
|Type of database or operation||When to back up|
|Small database||If the database is small, perform a full online backup every night. This will offer the most protection possible for your data.|
|Large database||If the database is very large, perform a full online backup every weekend and incremental online backups during the week.|
|Very large database||It is frequently necessary to rotate the backup schedule such that a tablespace or set of tablespaces gets backed up every night. It might take several days to get a complete backup set, but this will shorten the time each night that the backup is active and therefore affecting performance.|
|Active tablespaces||If certain tablespaces are very active, target them for tablespace backups as often as possible.|
|Unrecoverable operations||Any time unrecoverable operations are performed on the database, a full offline backup should be done. If this is not acceptable, avoid performing unrecoverable operations.|
|RESETLOGS||Any time you have to reset the redo logs with the RESETLOGS command, you should perform a full offline backup. If this is not acceptable, avoid using the RESETLOGS operation.|
|Recover until||Any time you have recovered with the Recover until option, you should perform a full offline backup.|
|Archive log files||In a 7x24 shop it is a good idea to back up the archive log files at soon as they are created. Keep online and available as least as many as are needed to recover from the oldest tablespace backup.|
This section provides a few examples for setting up a backup strategy. Your backup strategy might be more complicated, but I hope these examples can help you in setting it up.
Example of a Full Daily Backup
The scenario for this example is a small business that is open only during the day. There is fairly good activity on the database, and the data is very important. Perhaps in this case the database is small enough that a complete offline backup to tape can be run every night. The previous night's tape is archived to storage every day; after a month, the tapes can be reused, keeping one tape from each month permanently. If a problem occurs and a restore must be performed, a backup tape will always be available. Backup tapes should always be kept offsite to protect them in the event of a disaster, such as a fire. If the computer room and backup tape are destroyed, previous backup tapes are available in offsite storage.
Example of an Online Tablespace Backup
The scenario for this example is a large business that is open 24 hours a day, 7 days a week. There is high database activity, and no downtime is allowed. The data is extremely critical to the operation of the company. In this situation I would recommend some sort of standby system, but I discuss that on Day 18, "Administering Oracle Replication." For backups, I would suggest that a partial online backup be done every night on a rotational basis such that all of the tablespaces are backed up in three or four days, thus minimizing the impact on the users. To further reduce the effect on the users, these backups should be done to disk, perhaps across a network. These backup files on disk should in turn be backed up every night and archived offsite. The backup files should remain online for at least two backup cycles before being removed. The archive log files should also be backed up to disk, either locally or on a networked system, and backed up to tape and sent offsite. This will offer a good level of protection and provide for quick restoration if necessary. A fault-tolerant system should be a part of this solution as well.
In this chapter you have learned about one of the most important topics related to database operations. The database backup can make the difference between being able to effectively and quickly restore the system and suffering a total failure. Without an effective backup strategy, your system is completely unprotected against failure.
In this chapter you not only learned about how to develop a backup strategy and perform Oracle backups, but a little bit about fault tolerance as well. By carefully designing your system, you might be able to survive a disk failure that might otherwise have caused you to rebuild your entire system. Careful planning makes all the difference in this type of situation.
On Day 17, "Recovering the Database," you will learn how to recover from a failure using the backup strategy you created in this chapter. A backup without a recovery plan is not much better than having no backup at all. Tomorrow you will see how to develop a recovery solution and how to recover the database from a failure.
A Any failure that causes the Oracle instance to terminate in an unorderly fashion will require an instance recovery to occur. This includes system failure, software failure, or a shutdown abort.
Q What kind of failure requires the application of archive log files?
A Any failure after which a datafile needs to be restored from backup requires the application of archive log files.
Q How can my OS and redo log files be protected against the loss of a disk drive?
A By using a RAID hardware disk controller or software RAID, you might be able to save yourself weeks of work trying to restore your system.
Q What different types of backup methods are available?
A Under Windows NT, you can back up your database by using Backup Manager, the NT Backup utility, OS or third-party utilities, or the Export utility.
The workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to provide you with experience in using what you've learned. For answers to quiz questions, see Appendix A, "Answers."
2. Can you recover from the loss of a log file?
3. Can you recover from the loss of a datafile?
4. Can you recover if you are not running in ARCHIVELOG mode?
5. What different types of backups can you do?
6. What different types of files need to be backed up?
7. What is RAID?
8. Can the NT Backup Manager back up a remote system?
9. What are the archive log files used for?
10. What is the difference between an online backup and an offline backup?
2. Design a database system using partial fault tolerance.
3. Back up a tablespace using Backup Manager.
4. Back up a tablespace using NT Backup Manager.
© Copyright, Macmillan Computer Publishing. All rights reserved.