NewsCustom SearchContact
Equipartitioning - solution for very large data transformations, but not only for them
    by Ludek Bob Jankovsky, 16-Aug-2011 (ORACLE STUFFS)
There are situations, you want to join two huge tables unselectively for bulk data operartion.
Using NESTED LOOP operation is not effective for big number of rows.
Using HASH JOIN operation is usually effective in such cases, but it has limits when nomber of rows exceeds tens of millions.
EQUIPARTITIONING is a way allowing to make huge bulk operations on hundreds of millions of rows.

Create table T1 (key Number, ID Varchar2(20), DS Varchar2(1000));

Create table T2 (key Number, ID Varchar2(20), DS Varchar2(1000));

Select --+use_hash(t1,T2)
From T1 
Join T2 on T2.KEY=T1.KEY;

What is equipartitioning about? When you join two huge tables by a HASH JOIN operation, there is hash created for join keys of both sets. In cases of huge number of rows that could be very expensive operation. Not as expensive as NESTED LOOP iterating seeks for every rows on master table, but enough expensive to make time of operation grow progressive way at growing size of operated data.
Equipartitioning is a method, allowing to split both sets of data the same way into partition by key, which can be used as part of join key and let Oracle to provide HASH JOIN itterativelly for each the pair of adequate partitions.

Create table T1 (key Number, ID Varchar2(20), DS Varchar2(1000))
partition by hash(key) partitions 10;

Create table T2 (key Number, ID Varchar2(20), DS Varchar2(1000))
partition by hash(key) partitions 10;

Select --+use_hash(t1,T2)
From T1 
Join T2 on T2.KEY=T1.KEY;

There are following ways how to use Equipqrtitionig in your ETL and analytics processes:

  • Use current partitioning assuring that the definition of partitions is equal for both (all) tables.
  • Newly partition or subpartition tables to support that.
  • Create stage tables or table partitioned in proper way to satisfy the operation.

Equipartitioning works:

  • Without hinting when join tables with same way defined partition (careful here, the sefinition should be the same, could be hash or interval partitioning or list partitioning, but the definition should be equal including names of partitions at list partitioning, otherwise it won't work well).
  • At partitions against subpartitions.
  • At partitions against subpartitioned partitions.
  • At subpartitions against subpartitioned partitions.
  • For two or more tables.
  • Paralelly when PARALEL is allowed.

Equipartitioning needs:
  • The same definition of partitioning.
  • Partitioning key as part of join condition.

Helping hints:
...for parallel processes

Ludek Bob Jankovsky
All Right Reserved © 2007, Designed by Bob Jankovsky