Friday, February 29, 2008

Oracle 10g : Use Anti-Join to find unmatched data

If you are using Oracle, to find un-matched data between two tables use Anti-Join. That performs better than using not in or not exists.

Okey ...okey ...,you want me to prove right ? Let's take a use case and see what happens.

Let's take 2 tables test_1 and test_2, where test_1 contains 20,000 rows and test_2 contains 18,000 rows.

So the requirement is, with this sample data, what is the best way to find out the 2000 rows (unmatched rows) from test_1 table ?

Using NOT IN

SELECT t1.*
FROM test_1 t1
WHERE t1.object_id NOT IN (SELECT object_id
FROM test_2 t2);

CPU Used : 73
Session logical reads : 40088


Statistics seems quite high. The immediate other alternative that strikes is : use "Not Exists"

Using NOT EXISTS

SELECT t1.*
FROM test_1 t1
WHERE NOT Exists (SELECT 1
FROM test_2 t2
WHERE t2.object_id = t1.object_id);

CPU Used : 74
Session logical reads : 40088


CPU used is almost same and there is no change in "Session Logical Reads".

It is still slower. What to do ? Oracle always has an answer :-)

Using Anti join:

SELECT t1.*
FROM test_1 t1, test_2 t2
WHERE t1.object_id = t2.object_id(+)
AND t2.object_id IS NULL;

Here are some interesting results when this test is done on Oracle 8i and Oracle 10g.



If you take a look at the CPU used and the number of session logical reads, on "8i " it is almost half for the anti-outer join either we have small no of un-matched rows or huge number of un-matched rows..

Another interesting point is, when we are trying to find huge number of un-matched records, anti-outer join gives very good performance as the number of session logical reads is only "483".

For anti-outer join, even the execution plan is different. ie., "Nested Loops Outer".

Whereas , on 10g, the query is optimized internally and statistics are same (approximately ). This is also a good test between 8i and 10g.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Script to set up :

CREATE TABLE test_1 AS
SELECT object_id,object_type FROM All_Objects WHERE ROWNUM <= 20000;

ALTER TABLE test_1 ADD CONSTRAINT test_1_pk PRIMARY KEY (object_id);

CREATE TABLE test_2 AS
SELECT object_id,object_type FROM All_Objects WHERE ROWNUM <= 18000;

ALTER TABLE test_2 ADD CONSTRAINT test_2_pk PRIMARY KEY (object_id);

ALTER TABLE test_2 ADD CONSTRAINT test_2_fk Foreign KEY (object_id) REFERENCES test_1;

SELECT COUNT(*) FROM test_1;

SELECT COUNT(*) FROM test_2;

exec dbms_stats.gather_table_stats(ownname => '',tabname => 'Test_1',cascade => true);

exec dbms_stats.gather_table_stats(ownname => '',tabname => 'Test_2',cascade => true);

If you are on 10g, don't forget to gather statistics. It is an important step to use "CBO".

Note :
+ For doing the huge un-matched data test, drop the table test_2, change the number 18000 to 100 and recreate the table.

Good Luck !!

Long live "Tom Kyte".

r-a-v-i

1 comment:

Anonymous said...

Very good information. Lucky me I discovered your
website by chance (stumbleupon). I have saved as a favorite for later!
My page :: weekly options trading