My backyard.: Tools for monitoring the general health and performance of an oracle database

Statspack

Statspack is a set of performance monitoring and reporting utilities provided by Oracle starting from Oracle 8i and above. Statspack provides improved BSTAT/ESTAT functionality, though the old BSTAT/ESTAT scripts are still available. For more information about STATSPACK, read the documentation in file $ORACLE_HOME/rdbms/admin/spdoc.txt.

Install statspack

cd $ORACLE_HOME/rdbms/admin

sqlplus "/ as sysdba" @spdrop.sql       -- Drop and install statspack

sqlplus "/ as sysdba" @spcreate.sql     -- Enter tablespace names when prompted

Take performance snapshots of the database

sqlplus perfstat/perfstat

exec statspack.snap;                    -- Take a performance snapshots 

-- or :

exec perfstat.statspack.snap(i_snap_level=>10);  -- or instruct statspack to do gather more details in the snapshot 

-- (look up which oracle version supports which level).

The spauto.sql script can be customized and executed to schedule the collection of STATPACK snapshots.

Statspack reporting

-- Get a list of snapshots

select SNAP_ID, SNAP_TIME from STATS$SNAPSHOT; 

@spreport.sql -- Enter two snapshot id's for difference report

Other statspack scripts

Some of the other statspack scripts are:

* sppurge.sql - Purge (delete) a range of Snapshot Id's between the specified begin and end Snap Id's

* spauto.sql - Schedule a dbms_job to automate the collection of STATPACK statistics

* spcreate.sql - Installs the STATSPACK user, tables and package on a database (Run as SYS).

* spdrop.sql - Deinstall STATSPACK from database (Run as SYS)

* spreport.sql - Report on differences between values recorded in two snapshots

* sptrunc.sql - Truncates all data in Statspack tables

Potential problems

Statpack reporting suffers from the following problems:

* Some statistics may only be reported on COMPLETION of a query. For example, if a query runs for 12 hours, its processing won't be reported during any of the snapshots taken while the query was busy executing.

* If queries are aged out of the shared pool, the stats from V$SQL are reset. This can throw off the delta calculations and even make it negative. For example, query A has 10,000 buffer_gets at snapshot 1, but at snapshot #2, it has been aged out of the pool and reloaded and now shows only 1,000 buffer_gets. So, when you run spreport.sql from snapshot 1 to 2, you'll get 1,000-10,000 = -9,000 for this query.

Proactive Database Monitoring

The Oracle Database makes it easy to proactively monitor the health and performance of your database. It monitors the vital signs (or metrics) related to database health, analyzes the workload running against the database, and automatically identifies any issues that need your attention as an administrator. The identified issues are either presented as alerts and performance findings in Enterprise Manager or, if you prefer, can be sent to you through e-mail.

Alerts

Alerts help you monitor your database. Most alerts notify you of when particular metric thresholds are crossed. For each alert, you can set critical and warning threshold values. These threshold values are meant to be boundary values that when crossed indicate that the system is in an undesirable state. For example, when a tablespace becomes 97 percent full this can be considered undesirable, and Oracle will generate a critical alert.

Other alerts correspond to database events such as Snapshot Too Old or Resumable Session suspended. These types of alerts indicate that the event has occurred.

In addition to notification, you can set alerts to perform some action such as running a script. For instance, scripts that shrink tablespace objects can be useful for a Tablespace Usage warning alert.

By default, Oracle enables the following alerts:

* Table Space Usage (warning at 85 percent full, critical at 97 percent full)

* Snapshot Too Old

* Recovery Area Low on Free Space

* Resumable Session Suspended

You can modify these alerts or enable others by setting their metrics.

Performance Self-Diagnostics: Automatic Database Diagnostics Monitor

Oracle Database includes a self-diagnostic engine called the Automatic Database Diagnostic Monitor (ADDM). ADDM makes it possible for the Oracle Database to diagnose its own performance and determine how any identified problems can be resolved.To facilitate automatic performance diagnosis using ADDM, Oracle Database periodically collects snapshots of the database state and workload. The default collection interval is one hour. Snapshots provide a statistical summary of the state of the system at any point in time. These snapshots are stored in the Automatic Workload Repository (AWR), residing in the SYSAUX tablespace. The snapshots are stored in this repository for a set time (a week by default) before they are purged to make room for new snapshots.ADDM analyses data captured in AWR to determine the major problems in the system and in many cases recommends solutions and quantifies expected benefits.

Generally, the performance problems ADDM can flag include the following:

* Resource bottlenecks, such as when your database is using large amounts of CPU time or memory, for example as a result of high load SQL statements

* Poor connection management, such as when your application is making too many logons to the database

* Lock contention in a multiuser environment, when a lock to update data causes other sessions to wait, slowing down the database

Monitoring General Database State and Workload

The Enterprise Manager home page enables you to monitor the health of your database. It provides a central place for general database state information and is updated periodically. This page reports information that is helpful for monitoring database state and workload

The General section provides a quick view of the database, such as whether the database is Up or Down, the time the database was last started, instance name, and host name.

The Host CPU section shows the percentage of CPU time used in the overall system. This chart breaks down CPU percentage into time used by the database and time used by other processes. If your database is taking up most of the CPU time, you can explore the cause further by looking at the Active Sessions summary. This summary tells you what the database processes are doing, such as which ones are using CPU, or waiting on I/O. You can drill down for more information by clicking a link, such as CPU.

If other processes are taking up most of your CPU time, this indicates that some other application running on the database machine may be causing the performance problems. To investigate this further, click the Host link under the General section. This link takes you to the machine overview page where you can see some general information about the machine, such as what operating system it is running, how long the machine has been up, and any potential problems.

Click the Performance property page to see a summary of CPU utilization, average active sessions, instance disk I/O, and instance throughput. Additional Monitoring Links enable you to drill down to Top Activity and other data. The type of actions you can take to improve host performance depends on your system, but can include eliminating unnecessary processes, adding memory, or adding CPUs.

On the Home page, the Diagnostic Summary summarizes the latest ADDM performance findings. This section also summarizes any critical or warning alerts listed in the Alerts section.

The Alerts table provides information about any alerts that have been issued along with the severity rating of each. An alert is a notification that a metric threshold has been crossed. For example, an alert can be triggered when a tablespace is running out of space.

When an alert is triggered, the name of the metric causing it is displayed in the Name column. The severity icon (Warning or Critical) is displayed, along with time of alert, and alert value. You can click the message to learn more about the cause. For more information, see "Alerts" .

The Performance Analysis section provides a quick summary of the latest ADDM findings, highlighting the issues that are causing the most significant performance impact. This analysis can identify problems such as SQL statements that are consuming significant database time.

Managing Alerts

The following sections describe how to manage alerts.
Viewing Metrics and Thresholds

Metrics are a set of statistics for certain system attributes as defined by Oracle. They are computed and stored by the Automatic Workload Repository, and are displayed on the All Metrics page, which is viewable by clicking All Metrics under Related Links on the Database Home page.

For each of these metrics, you are able to define warning and critical threshold values, and whenever the threshold is crossed, Oracle issues an alert.

Alerts are displayed on the Database Home page under the Alerts heading (or Related Alerts for non-database alerts such as a component of Oracle Net).

Also you can see following tools

My backyard.

Thursday, February 11, 2010

Tools for monitoring the general health and performance of an oracle database

No comments:

Post a Comment