Wednesday, November 12, 2008

Does the columns of the select list impact the query execution plan ?

Proceed further, if you fall into any of these categories : "I am a Java/Oracle developer or a junior DBA. I am not sure what columns should be included in an Oracle Index ?"
OR "How to decide, which columns should be there in an index (B*tree) "?

First the anwer for the question in the title is "YES".

After 9i, Oracle has gotten much better and better. In the olden days of RBO, it has a "One plan fits all" mentality. But after CBO (Cost based optimizer) , there are advanced access paths (plans) available to access the data.

wow ..wow..wow ...wait a minute, let me cut the crap and come to the point straight away !!

Before we proceed, one important mantra to recite : "Full Scans are not always evil and indexes are not always good".

In general all of us know that, if a column is in the where clause and if we create an index( or concatenated) for that column, optimizer will use the index (in most of the cases).

Is that the only criteria for a column to be part of an index ?

NOT NECESSARILY.

It is important to note that, if the column(s) in the select list are in an index then the CBO might choose to use the index and perform a high speed index fast full scan, instead of a full table scan. Some times the opposite is much faster. So, why are we discussing this at all ? This is infact a learning. If you run into a case, where you
think full table scan is not the right thing that should happen. Then this technique could come as a rescue.

As always, let's take an example to prove the point.

-- Log into sqlplus (preferably Oracle 10g database)

drop table t;

create table t
( object_id number,
object_name char(255),
object_type varchar2(20),
status varchar2(10),
constraint t_pk primary key(object_id)
);

insert into t
select object_id, object_name, object_type,status
from all_objects;

commit;

create index idx_t_1 on t(object_id,object_name);

-- Make sure we have CBO
analyze table t compute statistics
for table
for all indexes
for all indexed columns;

select count(*) from t; -- 53541

select object_id, object_name
from t
where object_id = nvl(:n ,object_id);

-- After looking at the above query, you would immediately get a question .. .why the hell we need that nvl() in the where clause ?

-- The requirement is : If a value is passed to the bind variable, we will get the data for that particular row, otherwise we get all the rows back.

-- Let's say we are using 10 for the bind variable n.

set autotrace traceonly;

select object_id, object_name
from t
where object_id = nvl(:n ,object_id);

Execution Plan
----------------------------------------------------------
Plan hash value: 2828772549

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 259 | 458 (2)| 00:00:06 |
|* 1 | INDEX FAST FULL SCAN| IDX_T_1 | 1 | 259 | 458 (2)| 00:00:06 |
--------------------------------------------------------------------------------

So, CBO did a index fast full scan on the index IDX_T_1 and the number of consistent gets is 2076.

Say now we need object_type also in the select.

select object_id, object_name, object_type
from t
where object_id = nvl(10 ,object_id);

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 271 | 489 (2)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| T | 1 | 271 | 489 (2)| 00:00:06 |
--------------------------------------------------------------------------

As you see CBO did a full table can and the number of consistent gets is 2160.

Say we are not passing null for the above bind variable.

select object_id, object_name, object_type
from t
where object_id = nvl(null,object_id);

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 271 | 489 (2)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| T | 1 | 271 | 489 (2)| 00:00:06 |
--------------------------------------------------------------------------

CBO did a full table access and the number of consistent gets is : 5301 (very high).

If you did notice, the plans change with the columns in the select list.

So, what to do now ?

Just add the new column in the select list to the index. CBO will do a index fast full scan, a high speed method to use an index as a skinnier version of the table.
It is fast since it does a multi-block read unlike a single block read like index full scan.
Following quote from Tom Kyte : "Index fast full scan will be used, in general when your query references only the indexed columns and a full scan of the table can be avoided
by a fast full scan."

Why to wait, let's give a shot.

drop index idx_t_2;

create index idx_t_2 on t(object_id,object_name, object_type);

-- Make sure we have CBO

analyze table t compute statistics
for table
for all indexes
for all indexed columns;

-- For one record
select object_id, object_name, object_type
from t
where object_id = nvl(10,object_id);

Execution Plan
----------------------------------------------------------
Plan hash value: 813020976

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 267 | 476 (2)| 00:00:06 |
|* 1 | INDEX FAST FULL SCAN| IDX_T_2 | 1 | 267 | 476 (2)| 00:00:06 |
--------------------------------------------------------------------------------

-- For all records
select object_id, object_name, object_type
from t
where object_id = nvl(null,object_id);

Execution Plan
----------------------------------------------------------
Plan hash value: 813020976

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 267 | 476 (2)| 00:00:06 |
|* 1 | INDEX FAST FULL SCAN| IDX_T_2 | 1 | 267 | 476 (2)| 00:00:06 |
--------------------------------------------------------------------------------

Take a look at the difference in CPU cost between the full table scan and index scan.

Since idx_t_2 is a concatenated index, let's see what CBO does in these cases ...

Case 1 : If we just select the last column in the index.

select object_type
from t
where object_id = nvl(null,object_id);

Execution Plan
----------------------------------------------------------
Plan hash value: 813020976

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 476 (2)| 00:00:06 |
|* 1 | INDEX FAST FULL SCAN| IDX_T_2 | 1 | 12 | 476 (2)| 00:00:06 |
--------------------------------------------------------------------------------

Case 2 : Select the columns in reverse order

select object_type, object_name, object_id
from t
where object_id = nvl(null,object_id);

Execution Plan
----------------------------------------------------------
Plan hash value: 813020976

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 267 | 476 (2)| 00:00:06 |
|* 1 | INDEX FAST FULL SCAN| IDX_T_2 | 1 | 267 | 476 (2)| 00:00:06 |
--------------------------------------------------------------------------------

Case 3 : Change the where clause to use the last column in the index.

select object_type, object_name
from t
where object_type = 'TABLE'

Execution Plan
----------------------------------------------------------
Plan hash value: 813020976

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2505 | 643K| 474 (1)| 00:00:06 |
|* 1 | INDEX FAST FULL SCAN| IDX_T_2 | 2505 | 643K| 474 (1)| 00:00:06 |
--------------------------------------------------------------------------------

Case 4 : Change the where clause to use the columns of the index in reverse order ....

Execution Plan
----------------------------------------------------------
Plan hash value: 813020976

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 263 | 474 (1)| 00:00:06 |
|* 1 | INDEX FAST FULL SCAN| IDX_T_2 | 1 | 263 | 474 (1)| 00:00:06 |
--------------------------------------------------------------------------------

Case 5 : If we change where clause from : object_id = nvl(null,object_id);
to : object_id = :n;

CBO understands that we are looking for "a" particular value and uses Index Unique Scan.

Execution Plan
----------------------------------------------------------
Plan hash value: 1303508680

------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 267 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 267 | 2 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | T_PK | 1 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------

Take a look at the CPU cost ... it's just super fast.

Case 6 : Since Index fast full scan doesn't sort the results by default, let's see what happens in this case :

select object_type, object_name
from t
where object_id = nvl(10,object_id)
order by object_name;

Execution Plan
----------------------------------------------------------
Plan hash value: 1771868443

---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 267 | 477 (2)| 00:00:06 |
| 1 | SORT ORDER BY | | 1 | 267 | 477 (2)| 00:00:06 |
|* 2 | INDEX FAST FULL SCAN| IDX_T_2 | 1 | 267 | 476 (2)| 00:00:06 |
---------------------------------------------------------------------------------

So, does a IFFS and then does a sort.

Wait a minute, why did I say "Index fast full scan doesn't sort the results by default" ? Coz, Index full scan does.

Let's see an example :

Case 7 : If you are ordering the results by primary key, then that extra step of "SORT ORDER BY" would be eliminated and CBO does a Index full scan.

select object_id, object_name, object_type from t order by object_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 1399892806

------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 53541 | 13M| 2444 (1)| 00:00:30 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 53541 | 13M| 2444 (1)| 00:00:30 |
| 2 | INDEX FULL SCAN | T_PK | 53541 | | 177 (2)| 00:00:03 |
------------------------------------------------------------------------------------

If you notice, there is no extra step "SORT ORDER BY" and see the cost it took to do the scan : 177

Case 8 : What if we order by a non PK column ?

Then CBO will choose IFFS and then does a SORT order by.

Eg :

select object_id, object_name, object_type from t order by object_name;

Execution Plan
----------------------------------------------------------
Plan hash value: 1771868443

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 53541 | 13M| | 3565 (1)| 00:00:43 |
| 1 | SORT ORDER BY | | 53541 | 13M| 28M| 3565 (1)| 00:00:43 |
| 2 | INDEX FAST FULL SCAN| IDX_T_2 | 53541 | 13M| | 474 (1)| 00:00:06 |
-----------------------------------------------------------------------------------------

Don't get carried away with the indexes now ..... based on the cost calculations if CBO thinks that going to the table directly, instead of
doing that extra step of going to the index, get the rowid and then go to the table, it will choose to do to a full table scan.

So once again recite : "Full Scans are not always evil and indexes are not always good".

Hope this helps !!

"Long live Tom Kyte".

r-a-v-i