Thursday, May 29, 2008

Oracle Analytics / Analytic Functions: first_value()

Oracle database version : 10g
Schema : Scott

Analytic Functions Syntax :
Function(arg1,..., argn) OVER ( [PARTITION BY <...>] [ORDER BY <....>] [] )

On the “emp” table, say the requirement is to “get the highest paid job in each department”.

select * from emp order by deptno asc, sal desc;

If you run the above select query, you will see that the result should be :

DEPTNO JOB
10 PRESIDENT
30 MANAGER
20 ANALYST

So, how do we get this result ?

Analytics just Rock & Roll !!!

Note : This is one of the very common requirements … based on the column1, get the column2 for the respective max(column3).

Here we go …

select distinct deptno, first_value(job) over (partition by deptno order by sal DESC ) job
from emp;


first_value() : Returns the first value in an ordered set of values. If the first value in the set is null, then the function returns NULL unless you specify IGNORE NULLS

Note : Google for last_value() function as well.

Following is an example from asktom.

Script to create the test table and data.

CREATE TABLE myTable (
Ship_date DATE NOT NULL
,Ship_type VARCHAR2(10) NOT NULL
,Shipment NUMBER(6,0) NOT NULL
)
/
INSERT INTO myTable VALUES('01-01-2000', 'SHIP1', 27);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP1', 26);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP1', 25);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP2', 24);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP2', 23);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP2', 22);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP3', 21);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP3', 20);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP3', 19);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP1', 18);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP1', 17);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP1', 16);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP2', 15);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP2', 14);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP2', 13);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP3', 12);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP3', 11);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP3', 10);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP1', 9);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP1', 8);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP1', 7);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP2', 6);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP2', 5);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP2', 4);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP3', 3);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP3', 2);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP3', 1);
COMMIT;

Requirement 1 : Select only those rows that have the greatest Shipment value, per Ship_type, per Ship_date.

Solution :

select *
from (select myTable.*,
max(shipment) over(partition by ship_date, ship_type) max_shipment
from myTable)
where shipment = max_shipment

Requirement 2 : select only those rows that have the greatest Shipment value, per Ship_type, per Ship_date, showing ONLY those records with greatest Ship_date per Ship_type.

The result should be :

SHIP_DATE SHIP_TYPE SHIPMENT
---------- ---------- ----------
01-01-2002 SHIP1 9
01-01-2002 SHIP2 6
01-01-2002 SHIP3 3

Solution :

select * from (
select a.*, max(ship_date) over(partition by ship_type) max_ship_date
from (select myTable.*, max(shipment) over(partition by ship_date,
ship_type) max_shipment from myTable
) a
where shipment = max_shipment
) where ship_date = max_ship_date;


Long live “Tom Kyte”.

Good Luck,
r-a-v-i

Insert with check option

One of the ways to validate data while insertion, directly at the database level is : INSERT …. WITH CHECK OPTION

Let’s take an example and see what it is.

Database version : Oracle 10g.

Schema : Scott

Let’s create a test table from the standard “emp” table and work on it.

create table test_emp as select * from emp;

Let us suppose that our requirement is something like this :

Hence forth, in Department 30 manager’s should have commission between 750 – 1000 (included).

For our requirement above, the select query would be :

select *
from test_emp e
where e.deptno = 30
and e.job = 'MANAGER'
and e.comm >= 750
and e.comm <= 1000;


Usually the developers take care of this validation on the front-end itself. If there is only one point of data entry to your database (Eg : Web front-end), it’s okay to validate at the entry point. But think of a situation where there are multiple point of data entries to your database (Eg : Web front-end, Feeds, Web Services, JMS, MDB’s …etc.,).

So,

a) You have to duplicate the validation (it is almost impossible to maintain a common code base between these many discrete systems. Even if you have a common code base, on system might just not invoke the validation at all.)

b) If a common code base for validations is not used, then there could be a bug or missing implementation in any of the systems.

So, how about having a validation just before inserting the data into the database ? PL/SQL strikes immediately right 

We can put the volition logic in a pl/sql procedure/function and let all the above discrete systems invoke it.

But…….. there is a performance overhead. We are adding another layer after data access layer. For simple/moderate systems, it is okay. What we have a high volume system.

Oracle “almost always” has an option.

Oracle prohibits any changes to the table or view that would produce the rows that are not included in the sub query.

Make the above select query as a sub query to insert() as shown below.

insert into
(select *
from test_emp e
where e.deptno = 30
and e.job = 'MANAGER'
and e.comm >= 750
and e.comm <= 1000) values (7935, 'New Guy', 'MANAGER', 7782, sysdate, 1600.00, 500, 30);


The data that we are trying to insert in this query is against our rules, as the commission is = 500.

If you execute the above insert, Oracle still inserts the record.

select * from test_emp e where e.empno = 7935;


1 7935 New Guy MANAGER 7782 5/29/2008 9:02:05 AM 1600.00 500.00 30

Provide “WITH CHECK OPTION” in the sub query as shown below :

insert into
(select *
from test_emp e
where e.deptno = 30
and e.job = 'MANAGER'
and e.comm >= 750
and e.comm <= 1000 with check option)
values
(7935, 'New Guy', 'MANAGER', 7782, sysdate, 1600.00, 500, 30);


When you try to execute the above insert, you will get an error message :

ORA-01402 : view WITH CHECK OPTION where-clause violation

So, no matter where the data is coming from, we could perform a validation directly on the insert itself.

Also, note that we are NOT performing an extra query, as the sub query is within the insert itself.

Useful links :

http://www.oracle.com/technology/oramag/oracle/04-mar/o24asktom.html
http://www.orafaq.com/node/55
http://www.psoug.org/reference/analytic_functions.html
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:12864646978683
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:3170642805938

Hope this helps.

Long Live “Tom Kyte”.

Good Luck,
r-a-v-i

Tuesday, May 6, 2008

Oracle Analytics/Analytic Functions : How to get number of rows(count ) returned by a given query, in the same query ?

Often we, the Java/Oracle developers run into the situation, where we need to find the number of rows (count) returned by a query.
You can just find out the count by firing 2 queries.
Let's take an example and see how it works.

Schema : scott/tiger

Analytic Functions Syntax :

Function(arg1,..., argn) OVER ( [PARTITION BY <...>] [ORDER BY <....>] [] )

Let's suppose that,we are dealing with the employees in departments 10, 20.

Query#1 :

SELECT empno, deptno
FROM emp
WHERE deptno IN (10, 20)
ORDER BY deptno, empno;

Let's suppose that the requirement is : if the number of employees in departments 10,20 is more than 10, then set their manager as SCOTT (EmpNo#7788).
(Some dummy requirement ...please don't bother about it).

So, now we need to find out the count(). Let's find out the count.

Query#2 :

SELECT sum(count(*))
FROM emp
WHERE deptno IN (10, 20)
group by deptno;

This query would return the count, but if you did notice, we don't have the "empno" column in the select list.

We can't have "empno" in the column list as it is not in the group by columns. So, to get the empno of each department,we have to
fire the Query#2.

So, the conclusion is, using this approach without firing 2 queries we can't get the count() of rows returned by Query#1.

In Java world, you can use ScrollableResultSet to get the count. But again, you are doing an extra operation there to get the count.

So....without doing any extra operation or firing an extra query, how to get the count ?

Oracle Analytics comes to rescue.

Query#3 :

SELECT empno, deptno, count(*) over() cnt
FROM emp
WHERE deptno IN (10, 20)
ORDER BY deptno, empno;

OVER(), acts on entire record set returned by the where clause. Here is the result of Query#3.

EMPNO DEPTNO CNT
1 7782 10 8
2 7839 10 8
3 7934 10 8
4 7369 20 8
5 7566 20 8
6 7788 20 8
7 7876 20 8
8 7902 20 8

If you notice, using this approach, we have 2 benefits :

1) We can find the count() of rows returned by a query
2) We can also include other columns in the select list which are not part of the where clause.
Since we are not using "group by", we are out of it's limitations.

Long live "Tom Kyte".

Hope this helps !!

Good Luck,
r-a-v-i