Query optimization diagram



Executing explain plan


Explain plan for select * from employee;




Reading explain plan




| Id  | Operation                     |  Name        | Rows  | Bytes | Cost (%CPU)|
|   0 | SELECT STATEMENT              |              |     3 |   189 |    10  (10)|
|   1 |  NESTED LOOPS                 |              |     3 |   189 |    10  (10)|
|   2 |   NESTED LOOPS                |              |     3 |   141 |     7  (15)|
|*  3 |    TABLE ACCESS FULL          | EMPLOYEES    |     3 |    60 |     4  (25)|
|   4 |    TABLE ACCESS BY INDEX ROWID| JOBS         |    19 |   513 |     2  (50)|
|*  5 |     INDEX UNIQUE SCAN         | JOB_ID_PK    |     1 |       |            |
|   6 |   TABLE ACCESS BY INDEX ROWID | DEPARTMENTS  |    27 |   432 |     2  (50)|
|*  7 |    INDEX UNIQUE SCAN          | DEPT_ID_PK   |     1 |       |            |

Predicate Information (identified by operation id):
3 – filter(“E”.”EMPLOYEE_ID”<103)
5 – access(“E”.”JOB_ID”=”J”.”JOB_ID”)
Each row in the output table corresponds to a single step in the execution plan. Note that the step Ids with asterisks are listed in the Predicate Information section. Each step of the execution plan returns a set of rows that either is used by the next step or, in the last step, is returned to the user or application issuing the SQL statement. A set of rows returned by a step is called a row set.


The following steps in Example 11-2 physically retrieve data from an object in the database:

Step 3 reads all rows of the employees table.

Step 5 looks up each job_id in JOB_ID_PK index and finds the rowids of the associated rows in the jobs table.

Step 4 retrieves the rows with rowids that were returned by Step 5 from the jobs table.

Step 7 looks up each department_id in DEPT_ID_PK index and finds the rowids of the associated rows in the departments table.

Step 6 retrieves the rows with rowids that were returned by Step 7 from the departments table.


The following steps in Example 11-2 operate on rows returned by the previous row source:

Step 2 performs the nested loop operation on job_id in the jobs and employees tables, accepting row sources from Steps 3 and 4, joining each row from Step 3 source to its corresponding row in Step 4, and returning the resulting rows to Step 2.

Step 1 performs the nested loop operation, accepting row sources from Step 2 and Step 6, joining each row from Step 2 source to its corresponding row in Step 6, and returning the resulting rows to Step 1.


Understanding Access Paths for the Query Optimizer:

Access paths are ways in which data is retrieved from the database. In general, index access paths should be used for statements that retrieve a small subset of table rows, while full scans are more efficient when accessing a large portion of the table. Online transaction processing (OLTP) applications, which consist of short-running SQL statements with high selectivity, often are characterized by the use of index access paths.


Why a Full Table Scan Is Faster for Accessing Large Amounts of Data

Full table scans are cheaper than index range scans when accessing a large fraction of the blocks in a table. This is because full table scans can use larger I/O calls, and making fewer large I/O calls is cheaper than making many smaller calls.


Index Unique Scans

This scan returns, at most, a single rowid. Oracle performs a unique scan if a statement contains a UNIQUE or a PRIMARY KEY constraint that guarantees that only a single row is accessed.


Index Range Scans

An index range scan is a common operation for accessing selective data. It can be bounded (bounded on both sides) or unbounded (on one or both sides). Data is returned in the ascending order of index columns. Multiple rows with identical values are sorted in ascending order by rowid. If an index can be used to satisfy an ORDER BY clause, then the optimizer uses this option and avoids a sort.


The optimizer uses a range scan when it finds one or more leading columns of an index specified in conditions, such as the following:

col1 = :b1

col1 < :b1

col1 > :b1

col1 like ‘ASD%’


Index Skip Scans

Index skip scans improve index scans by nonprefix columns. Often, scanning index blocks is faster than scanning table data blocks.Skip scanning lets a composite index be split logically into smaller subindexes. In skip scanning, the initial column of the composite index is not specified in the query. In other words, it is skipped.The number of logical subindexes is determined by the number of distinct values in the initial column. Skip scanning is advantageous if there are few distinct values in the leading column of the composite index and many distinct values in the nonleading key of the index. Consider, for example, a table employees (sex, employee_id, address) with a composite index on (sex, employee_id). Splitting this composite index would result in two logical subindexes, one for M and one for F.


For this example, suppose you have the following index data:









The index is split logically into the following two subindexes:

The first subindex has the keys with the value F.

The second subindex has the keys with the value M.


The column sex is skipped in the following query:

FROM employees
WHERE employee_id = 101;
A complete scan of the index is not performed, but the subindex with the value F is searched first, followed by a search of the subindex with the valueM.


Index Joins

An index join is a hash join of several indexes that together contain all the table columns that are referenced in the query. If an index join is used, then no table access is needed, because all the relevant column values can be retrieved from the indexes. An index join cannot be used to eliminate a sort operation.


How the Query Optimizer Chooses an Access Path

The query optimizer chooses an access path based on the following factors:

-The available access paths for the statement

-The estimated cost of executing the statement, using each access path or combination of paths


To choose an access path, the optimizer first determines which access paths are available by examining the conditions in the statement’s WHERE clause and its FROM clause. The optimizer then generates a set of possible execution plans using available access paths and estimates the cost of each plan, using the statistics for the index, columns, and tables accessible to the statement. Finally, the optimizer chooses the execution plan with the lowest estimated cost.


When choosing an access path, the query optimizer is influenced by the following:

Optimizer Hints – You can instruct the optimizer to use a specific access path using a hint.


How the Query Optimizer Chooses Execution Plans for Joins

-The optimizer first determines whether joining two or more tables definitely results in a row source containing at most one row. The optimizer recognizes such situations based on UNIQUE and PRIMARY KEY constraints on the tables. If such a situation exists, then the optimizer places these tables first in the join order. The optimizer then optimizes the join of the remaining set of tables.


With the query optimizer, the optimizer generates a set of execution plans, according to possible join orders, join methods, and available access paths. The optimizer then estimates the cost of each plan and chooses the one with the lowest cost. The optimizer estimates costs in the following ways:

-The cost of a nested loops operation is based on the cost of reading each selected row of the outer table and each of its matching rows of the inner table into memory. The optimizer estimates these costs using the statistics in the data dictionary.

-The cost of a sort merge join is based largely on the cost of reading all the sources into memory and sorting them.

-The cost of a hash join is based largely on the cost of building a hash table on one of the input sides to the join and using the rows from the other of the join to probe it.


Nested Loop Joins

Nested loop joins are useful when small subsets of data are being joined and if the join condition is an efficient way of accessing the second table. It is very important to ensure that the inner table is driven from (dependent on) the outer table. If the inner table’s access path is independent of the outer table, then the same rows are retrieved for every iteration of the outer loop, degrading performance considerably. In such cases, hash joins joining the two independent row sources perform better.

Hint: USE_NL(table1 table2)


Hash Joins

Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.


When the Optimizer Uses Hash Joins

The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:

-A large amount of data needs to be joined.

-A large fraction of a small table needs to be joined



Sort Merge Joins

Sort merge joins can be used to join rows from two independent sources. Hash joins generally perform better than sort merge joins. On the other hand, sort merge joins can perform better than hash joins if both of the following conditions exist:

The row sources are sorted already.

A sort operation does not have to be done.


Sort merge joins are useful when the join condition between two tables is an inequality condition (but not a nonequality) like <, <=, >, or >=. Sort merge joins perform better than nested loop joins for large data sets. You cannot use hash joins unless there is an equality condition.


In a merge join, there is no concept of a driving table. The join consists of two steps:

-Sort join operation: Both the inputs are sorted on the join key.

-Merge join operation: The sorted lists are merged together.

If the input is already sorted by the join column, then a sort join operation is not performed for that row source.


The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:

-The join condition between two tables is not an equi-join.

-Because of sorts already required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.


Cartesian Joins

A Cartesian join is used when one or more of the tables does not have any join conditions to any other tables in the statement. The optimizer joins every row from one data source with every row from the other data source, creating the Cartesian product of the two sets.



Anti Joins

An antijoin is a join between two data sets that returns a row from the first set when a matching row does not exist in the subquery data set. Like a semijoin, an antijoin stops processing the subquery data set when the first match is found. Unlike a semijoin, the antijoin only returns a row when no match is found. An antijoin avoids unnecessary processing when a query only needs to return a row when a match does not exist. With large data sets, this optimization can result in significant time savings over a nested loops join that must loop through every record returned by the inner query for every row in the outer query. The optimizer can apply the antijoin optimization to nested loops joins, hash joins, and sort merge joins.


The optimizer may choose an antijoin in the following circumstances: The statement uses either the NOT IN or NOT EXISTS clause.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s