HomeArticlesSlideshowsAsk BobLogin or Register
NewsCustom SearchContactLinks
Simple ETL module generator (XMLMap)
ORACLE STUFFS
19-Aug-2008
There are several complex ways of generating PL/SQL ETL modules based on metadata. Now we have to avoid all the rules such as:
  • Open metadata interface
  • Custom - dynamic ETL patterns (templates)
  • ...
Following stuff is dedicated to be used in places with the minimum impact into infrastructure requirement.

XMLMap concept of generated mappings supposes generating of package body based on package specification enhanced by XML metadata as a comment in the end of specification. Solution requires two following packages:

Script: BWbuff_package.sql

CREATE OR REPLACE package BW_Buffer AUTHID CURRENT_USER as
  --------------------------------------------------------------
  --Purpose: Execute and Output buffer to store texts         --
  --Author:  Bob Jankovsky, copyleft 2005                     --
  --Version: 1.2 /09-FEB-2005                                 --
  --------------------------------------------------------------
   TYPE TA_CHR is table of VARCHAR2(32767) index by binary_integer;
   g_Buffer TA_CHR;
   g_append_pos    Integer;
   g_append_len    Integer;

...more
Script: bw_simple_gen.sql

CREATE OR REPLACE PACKAGE BW_SIMPLE_GEN AS
  --------------------------------------------------------------
  --Purpose: Module generates package body based on XML       --
  --         specification of mapping stored as comment       --
  --         in package specification.                        --
  --Author:  Bob Jankovsky, copyleft 2007                     --
  --Version: 1.0 /19-OCT-2007                                 --
  --Exceptions: 20006 ... Unsuccessfully compiled module      --
  --------------------------------------------------------------
Procedure GenPackageBody(

...more

Package body will be generated based on the package specification following way:

  • Create specification with mapping specification in the XML METADATA comment.
  • Generate Package body using following procedure:

BEGIN 
  BW_SIMPLE_GEN.GenPackageBody(<generated package name>);
END;
/

  • Check errors.

Specification of mapping metadata starts with the
/*METADATA:
comment. It finishes with end of comment mark:
*/

Following sample could be used as a template:

CREATE OR REPLACE PACKAGE ETL_PT_CUST_M IS
/* ===================================================
 Purpose:           Fill the PARTY table from CUSTOMERS
 Version:           1.0
 Change History:
 Date        Author         Change
 20.08.2008  Bob Jankovsky  New module (v 1.0)
*/
procedure main
 (p_process_key      in number default -1,
  p_effective_date   in date default to_date('10000101','YYYYMMDD'),
  p_load_partition   in varchar2 default null
 );

/*METADATA:
<?xml version="1.0"?>
<MODULE Pattern="DIFF_MERGE" TargetOwner="A2" TargetTable="PARTY">
<EXPR Name="N2CH">to_char(:1)</EXPR> 
<SRCTAB Owner="A2" Name="CUSTOMER" Alias="CUST" Type="FROM" Link="" />
<COLMAP TargetColumn="PT_SEQ" DirectFlag="Y" UpdateFlag="N">PT_SEQ.nextval</COLMAP>
<COLMAP TargetColumn="PT_SID" MatchFlag="Y">'SR1'</COLMAP>
<COLMAP TargetColumn="PT_ID" SourceAlias="CUST" SourceColumn="ID" Expr="N2CH" MatchFlag="Y" />
<COLMAP TargetColumn="PT_FIRST_NAME" SourceAlias="CUST" SourceColumn="CUST_FIRST_NAME" />
<COLMAP TargetColumn="PT_LAST_NAME" SourceAlias="CUST" SourceColumn="CUST_LAST_NAME" />
<COLMAP TargetColumn="PT_BIRTH_DATE" SourceAlias="CUST" SourceColumn="CUST_BIRTH_DATE" />
<COLMAP TargetColumn="PT_CODE" SourceAlias="CUST" SourceColumn="CUST_CODE" />
<COLMAP TargetColumn="PT_DELETED_FLAG" SourceAlias="CUST" SourceColumn="CUST_DELETED_FLAG" />
<COLMAP TargetColumn="UPD_EFF_DATE" Updateflag="Y">p_effective_date</COLMAP>
<COLMAP TargetColumn="INS_DT" Updateflag="N">sysdate</COLMAP>
<COLMAP TargetColumn="UPD_DT" Updateflag="Y">sysdate</COLMAP>
<COLMAP TargetColumn="INS_PROCESS" Updateflag="N">p_process_key</COLMAP>
<COLMAP TargetColumn="UPD_PROCESS" Updateflag="Y">p_process_key</COLMAP>
</MODULE>
*/
END ETL_PT_CUST_M;
/
BEGIN  
  BW_SIMPLE_GEN.GenPackageBody('ETL_PT_CUST_M'); 
END; 
/

[Download]


The MODULE element
is a root element of all the physcal mapping. It also specifies Target table and a module pattern (i.e what to do with data).

Syntax:
<?xml version="1.0"?>
<MODULE …module attributes…>
…module elements
</MODULE>

Attributes:

AttributeTypeDescription
PatternMandatoryA functional type of operation(DIFF_MERGE,FR_DELETE,INSERT… ), see bellow.
TargetOwnerMandatoryOwner of target table
TargetTableMandatoryName of target table
TargetLinkOptionalDB link into target database (push method)
TargetPartitionOptionalTarget partition (do not use it with a push method)
SourceHintOptionalOptimalization hint for extraction
TargetHintOptionalOptimalization hint for load


The SRCTAB element
describes source or lookup table. One table could be define several times (multiple lookup) identified by unique Alias of table.

Syntax:
<?xml version="1.0"?>
<MODULE …module attributes…>
<SRCTAB …srctab attributes…>
…srctab elements…
</SRCTAB>
<SRCTAB …srctab attributes…>
…srctab elements…
</SRCTAB>
…other module elements…
</MODULE>

Attributes:

AttributeTypeDescription
AliasMandatoryUnique alias for an instance of table in mapping.
OwnerMandatoryOwner schema of source table
NameMandatoryName of source table
JoinTypeMandatoryTable join type, one of FROM, JOIN, LEFT JOIN, nebo CROSS JOIN. In each module should be just one table of type FROM.
LinkOptionalName of DB-link of source table
PartitionOptionalPartition of local source table (do not use with Link specified).
EnclFlag{Y/N}Flag of subquery usage requirement
NoIndex{Y/N}Flag to suppress usage of indexes with the table

Sub-elements of the SRCTAB element:

  • SRCTAB/WHERE - specifies local filter condition associated with particular table. It's usefull to define it here instead of common FILTER cause of more efficient manipulation with matadata.
  • SRCTAB/ON - specifies join conditions.

Syntax:
<?xml version="1.0"?>
<MODULE …module attributes…>
<SRCTAB …srctab attributes…>
…srctab elements…
</SRCTAB>
<SRCTAB …srctab attributes…>
   <WHERE> podmínka </WHERE>
</SRCTAB>
…other module elements…
</MODULE>

There could be several sub-elements of the same type for one element:

<SRCTAB Alias=”STAB1”…srctab attributes…>
   <WHERE> STAB1.FLAG_1= ‘Y’ and STAB1.FLAG_2= ‘Y’</WHERE>
</SRCTAB>

means the same, as

<SRCTAB Alias=”STAB1”…srctab attributes…>
   <WHERE> STAB1.FLAG_1= ‘Y’</WHERE>
   <WHERE> STAB1.FLAG_2= ‘Y’</WHERE>
</SRCTAB>

The same for SRCTAB/ON

<?xml version="1.0"?>
<MODULE …module attributes…>
<SRCTAB …srctab attributes…>
…srctab elements…
</SRCTAB>
<SRCTAB …srctab attributes…>
   <ON> podmínka </ON>
</SRCTAB>
…other module elements…
</MODULE>


The COLMAP element
describes mapping of one target column. There are two basic types of definition:
  • Formalized (recommended, if it is possible)
  • Freestyle expression

Syntax - formalized:
<COLMAP TargetColumn=”TAB_ID” SourceAlias=”STAB1” SourceColumn=”TAB_ID”/>

Syntax - freestyle expression:
<COLMAP TargetColumn=”TAB_ID”>STAB1.TAB_ID</COLMAP>

Syntax - sequence usage example:
<COLMAP TargetColumn="PRODI_KEY" DirectFlag="Y" UpdateFlag="N">
A2.PRODUCT_INSTANCE_SEQ.NEXTVAL
</COLMAP>

Attributes:

AttributeTypeDescription
TargetColumnMandatoryUnique name of target column.
SourceAliasOptionalAlias of source table of column.
SourceAliasOptionalAlias of source table of column. It should be specified in some SRCTAB specification.
SourceColumnOptionalName of source column. It should be specified in pair with SourceAlias.
MatchFlag{Y|N}Flag of column used as a matching key.
UpdateFlag{Y|N}Flag of update could be used to suppress updating of some values specified just during insert. Logically it is unset for coumn with MatchFlag set.
DirectFlag{Y|N}Flag of direct usage of constant or independent value - useful for non-deterministic function such as sequences.
DummyFlag{Y|N}Flag of column used just for filtering or deduplication but for target modufication.
PartFlag{Y|N}Flag specific for CREEPING DEATH algorithm.
ValidFromFlag{Y|N}Flag specific for CREEPING DEATH algorithm. It specifies column with VALID_FROM meaning.
ValidToFlag{Y|N}Flag specific for CREEPING DEATH algorithm. It specifies column with VALID_THRU meaning.
NullableFlag{Y|N}Flag, that target column can be null.
ExprOptionalIdentifier of simple expression from the EXPR element.
UpdateExprOptionalIdentifier of simple expression from the EXPR element specific just for update.
InsertExprOptionalIdentifier of simple expression from the EXPR element specific just for insert.

Example of generated DIFF_MERGE mapping generated from the template above:

package body ETL_PT_CUST_M as
  --Procedure MAIN
  procedure main
   (p_process_key      in number default -1,
    p_effective_date   in date default to_date('10000101','YYYYMMDD'),
    p_load_partition   in varchar2 default null
   ) is
    --Audit variables
    v_processed_rows Integer:=0;
    v_Module_name Varchar2(100):='ETL_PT_CUST_M';
  Begin
    --Log info -- some logging procedure should be added here
    --Mapping statement
    MERGE INTO A2.PARTY TRG$$
    USING(
      SELECT
          'SR1' as PT_SID
         ,to_char(CUST.ID) as PT_ID
         ,CUST.CUST_FIRST_NAME as PT_FIRST_NAME
         ,CUST.CUST_LAST_NAME as PT_LAST_NAME
         ,CUST.CUST_BIRTH_DATE as PT_BIRTH_DATE
         ,CUST.CUST_CODE as PT_CODE
         ,CUST.CUST_DELETED_FLAG as PT_DELETED_FLAG
         ,p_effective_date as UPD_EFF_DATE
         ,sysdate as INS_DT
         ,sysdate as UPD_DT
         ,p_process_key as INS_PROCESS
         ,p_process_key as UPD_PROCESS
      FROM A2.CUSTOMER CUST
    ) SRC$$
    ON(
      TRG$$.PT_SID=SRC$$.PT_SID
      AND TRG$$.PT_ID=SRC$$.PT_ID
    )
    WHEN MATCHED THEN UPDATE SET
       TRG$$.PT_FIRST_NAME=SRC$$.PT_FIRST_NAME
      ,TRG$$.PT_LAST_NAME=SRC$$.PT_LAST_NAME
      ,TRG$$.PT_BIRTH_DATE=SRC$$.PT_BIRTH_DATE
      ,TRG$$.PT_CODE=SRC$$.PT_CODE
      ,TRG$$.PT_DELETED_FLAG=SRC$$.PT_DELETED_FLAG
      ,TRG$$.UPD_EFF_DATE=SRC$$.UPD_EFF_DATE
      ,TRG$$.INS_DT=SRC$$.INS_DT
      ,TRG$$.UPD_DT=SRC$$.UPD_DT
      ,TRG$$.INS_PROCESS=SRC$$.INS_PROCESS
      ,TRG$$.UPD_PROCESS=SRC$$.UPD_PROCESS
      WHERE (TRG$$.PT_FIRST_NAME!=SRC$$.PT_FIRST_NAME
      OR TRG$$.PT_LAST_NAME!=SRC$$.PT_LAST_NAME
      OR TRG$$.PT_BIRTH_DATE!=SRC$$.PT_BIRTH_DATE
      OR TRG$$.PT_CODE!=SRC$$.PT_CODE
      OR TRG$$.PT_DELETED_FLAG!=SRC$$.PT_DELETED_FLAG
      OR TRG$$.UPD_EFF_DATE!=SRC$$.UPD_EFF_DATE
      OR TRG$$.INS_DT!=SRC$$.INS_DT
      OR TRG$$.UPD_DT!=SRC$$.UPD_DT
      OR TRG$$.INS_PROCESS!=SRC$$.INS_PROCESS
      OR TRG$$.UPD_PROCESS!=SRC$$.UPD_PROCESS
      )--/differential part
    WHEN NOT MATCHED THEN INSERT
      (
         TRG$$.PT_SEQ
        ,TRG$$.PT_SID
        ,TRG$$.PT_ID
        ,TRG$$.PT_FIRST_NAME
        ,TRG$$.PT_LAST_NAME
        ,TRG$$.PT_BIRTH_DATE
        ,TRG$$.PT_CODE
        ,TRG$$.PT_DELETED_FLAG
        ,TRG$$.UPD_EFF_DATE
        ,TRG$$.INS_DT
        ,TRG$$.UPD_DT
        ,TRG$$.INS_PROCESS
        ,TRG$$.UPD_PROCESS
       )VALUES(
         PT_SEQ.nextval
        ,SRC$$.PT_SID
        ,SRC$$.PT_ID
        ,SRC$$.PT_FIRST_NAME
        ,SRC$$.PT_LAST_NAME
        ,SRC$$.PT_BIRTH_DATE
        ,SRC$$.PT_CODE
        ,SRC$$.PT_DELETED_FLAG
        ,SRC$$.UPD_EFF_DATE
        ,SRC$$.INS_DT
        ,SRC$$.UPD_DT
        ,SRC$$.INS_PROCESS
        ,SRC$$.UPD_PROCESS
      )
    ;
    v_processed_rows:=v_processed_rows+SQL%ROWCOUNT;
    --Log info -- some logging procedure should be added here
  End Main;
End ETL_PT_CUST_M;


The CONSTANT element
describes constants to be used in mapping. They could be defined as literal, result of function or result of query.

<CONSTANT Name="C_EFFECTIVE_MONTH" 
          Type="Date" 
          Value="MS_WCP_STD.GET_EFFECTIVE_MONTH (P_EFFECTIVE_DATE)"/>

<CONSTANT Name="C_CZK_CURR_KEY" Type="Integer">
  Select CURR_KEY from DWH_OWNER.CURRENCIES@DWH where CUR_ISO_CODE=’CZK’
</CONSTANT >

Attributes:

AttributeTypeDescription
NameMandatoryUnique name of constant.
TypeMandatoryData type of constant.
ValueOptionalValue specification (initial).


The EXPR element
describes simple expression such as cleansing function over simple values.

<EXPR Name="KEY">NVL(:1,-2)</EXPR>

...

<COLMAP TargetColumn="CURR_KEY" Expr="KEY">C_CZK_CURR_KEY</COLMAP>

Attributes:

AttributeTypeDescription
NameMandatoryUnique name of expression.


The FILTER element
describes various kind of filters valid over all mapping.
  • SOURCE - filtering condition valid for input set of data. It use input notation (source names and aliases)
  • TARGET - filtering condition valid for target set of data. It use target notation (target names and TRG$$ as alias)
  • SLURCE_OUTER - like SOURCE but in a target notation. It should be used for evaluation of used analytical functions
  • ON - additional contition for MERGE ON in a target notation
  • INSET - additional contition for INSERT WHERE in a target notation
  • UPDATE - additional contition for UPDATE WHERE in a target notation

<FILTER Type=”SOURCE”> STAB1.FLAG_1 = STAB2.FLAG_2 </WHERE>

<FILTER Type=”TARGET”> TRG$$.CURR_KEY = C_CZK_CURR_KEY</WHERE>

Attributes:

AttributeTypeDescription
TypeMandatoryType of filter, SOURCE is default. See above.

NOTE: I don't recommend to use mentioned solution as a system way of ETL architecture. If you have to solve that, contact me, I'll recommend you better way. Anyway, this way is useful for minimum impact requirements and for illustration and education cases.

See: Metadata driven ETL approach


Particular patterns used in a transformation phase of ETL and advanced options of this simple generator will be described in later in separate articles.

See: DIFF_MERGE - Differential merge ETL transformation pattern
See: INSERT ETL transformation patterns family
See: FULL_REFRESH_DELETE ETL Transformation pattern
See: CREEPING_DEATH algorithm ETL Transformation pattern
See: SELF_UPDATE ETL Transdormation pattern


The PREMAPPING/POSTMAPPING element
can contain PL/SQL code executed either before (PREMAPPING) or after (POSTMAPPING) mapping. It solves tasks such as truncate or create proper partition, calculate statistics etc.

<PREMAPPING Comment="Truncate stage table before">
  UTL_PKG.TRUNCATE_TABLE_PR('STG_PARTY','A2');
</PREMAPPING>

Attributes:

AttributeTypeDescription
CommentOptionalComment of task.


The EXECCONDITION element
can contain condition when the mapping has to be executed. It is evaluated before PREMAPPING.

<EXECCONDITION Comment="Do not work when the month was loaded yet">
  C_EFFECTIVE_MONTH&gt;C_LAST_WCP_MONTH
</EXECCONDITION>

Attributes:

AttributeTypeDescription
CommentOptionalComment of the condition.


The SRCTAB element - advanced options
Following options are possible using subelements of SRCTAB:
  • Deduplication (DEDUP_PARTITION,DEDUP_ORDER)
  • Aggregation (AGG)
  • Subquery (SUBJOIN)

The SRCTAB/DEDUP_PARTITION element Key attributes of deduplication set. The SRCTAB/DEDUP_ORDER element Order based preference for choise proper of duplicates to be used.
Example:

<SRCTAB Owner="A2" Name="PARTY_ADDR" Alias="PTADR" Type="JOIN">
  <ON>PTADR.PT_SEQ=PT.PT_SEQ</ON>
  <WHERE>P_EFFECTIVE_DATE between PTADR.VALID_FROM and PTADR.VALID_THRU</WHERE>
</SRCTAB> 

could be daduplicated following way:

<SRCTAB Owner="A2" Name="PARTY_ADDR" Alias="PTADR" Type="JOIN">
  <ON>PTADR.PT_SEQ=PT.PT_SEQ</ON>
  <WHERE>P_EFFECTIVE_DATE between PTADR.VALID_FROM and PTADR.VALID_THRU</WHERE>
  <DEDUP_PARTITION>PT_KEY</DEDUP_PARTITION>
  <DEDUP_ORDER>PTADR_TYPE desc,VALID_FROM desc</DEDUP_ORDER>
</SRCTAB> 

It deduplicates addresses prefering some types and more recent records.

The SRCTAB/AGG element allows specify an aggregation function used on source table.
For each aggregated column one tag. All unspecified columns used in mappings will be used in GROUP BY clause.
Example:

<SRCTAB Owner="DWH_OWNER" Name="CREDIT_CARD_FACTS" Alias="M_CC" Type="LEFT JOIN" Link="DWH">
<WHERE>M_CC.ACCFC_PERIOD =C_EFFECTIVE_MONTH</WHERE>
<AGG AggFunction="SUM" ColumnName="ACCFC_BALANCE_CZK"/>
</SRCTAB>

Attributes:

AttributeTypeDescription
ColumnNameMandatoryName of aggregated column.
AggFunctionMandatoryAggregating function such as MIN, MAX, AVG, COUNT.
DistinctFlag{Y|N}Flag of usage DISTINCT in COUNT(DISTINCT ...).
AliasOptionalAlias to rename result of aggregation when the column is used more than once (default ColumName).

Following example shows usage of aliases:

<SRCTAB Owner="DWH_OWNER" Name="CREDIT_CARD_FACTS" Alias="M_CC" Type="LEFT JOIN" Link="DWH">
<WHERE>M_CC.ACCFC_PERIOD =C_EFFECTIVE_MONTH</WHERE>
<AGG AggFunction="SUM" ColumnName="ACCFC_BALANCE_CZK"
Alias="SUM_ACCFC_BALANCE_CZK"/>
<AGG AggFunction="MIN" ColumnName="ACCFC_BALANCE_CZK"
Alias="MIN_ACCFC_BALANCE_CZK"/>
</SRCTAB>

Aliased columns are further named as: M_CC.SUM_ACCFC_BALANCE_CZK a M_CC.MIN_ACCFC_BALANCE_CZK.

Despite ColumName and AggFunction attributes are marked as mandatory, there is one exception. Following example realizes simple DISTINCT of the result set.

<SRCTAB Owner="DWH_OWNER" Name="CREDIT_CARD_FACTS" Alias="M_CC" Type="LEFT JOIN" Link="DWH">
<WHERE>M_CC.ACCFC_PERIOD =C_EFFECTIVE_MONTH</WHERE>
<AGG/>
</SRCTAB>

The SRCTAB/SUBJOIN element allows specify subquery
It allows 2 types of subelements:

  • WHERE
  • ON
They work similar way as at SRCTAB.
It is useful mostly in combination with AGG and DEDUP features
Example:

<SRCTAB Owner="DWH_OWNER" Name="PARTY_HISTORY" Alias="PTH" Type="FROM" Link="DWH">
<SUBJOIN Owner="DWH_OWNER" Name="PARTY_TYPES" Alias="PTTP" Type="JOIN" Link="DWH">
<WHERE>PTTP.PTTP_UNIFIED_ID='F'</WHERE>
<ON>PTH.PTTP_KEY=PTTP.PTTP_KEY</ON>
</SUBJOIN>
<WHERE>PTH.PTORI_KEY=10</WHERE>
<WHERE>P_EFFECTIVE_DATE between PTH.PTH_VALID_FROM and PTH.PTH_VALID_TO</WHERE>
</SRCTAB>

The SRCTAB/SUBJOIN/COL element allows specify outer aliases to get rid of duplicities. It is often unnecessary.
Example:

<SRCTAB Owner="DWH_OWNER" Name="PARTY_HISTORY" Alias="PTH" Type="FROM" Link="DWH">
<SUBJOIN Owner="DWH_OWNER" Name="PARTY_TYPES" Alias="PTTP" Type="JOIN" Link="DWH">
<WHERE>PTTP.PTTP_UNIFIED_ID='F'</WHERE>
<ON>PTH.PTTP_KEY=PTTP.PTTP_KEY</ON>
<COL ColumnName="PTTP_SOURCE_ID" ColumnAlias="PTTP_ID"/>
</SUBJOIN>
<WHERE>PTH.PTORI_KEY=10</WHERE>
<WHERE>P_EFFECTIVE_DATE between PTH.PTH_VALID_FROM and PTH.PTH_VALID_TO</WHERE>
</SRCTAB>

The PTTP.PTTP_SOURCE_ID column will be further named as PTH.PTTP_ID.

Attributes:

AttributeTypeDescription
ColumnNameMandatoryInner name of column.
AliasNameMandatoryFurther name - alias


LOOP and LOOP/VAR elements allows repeat execution of the same module several times with various values of defined constants. Execution is repeated including EXECCONDITION, PREMAPPING, POSTMAPPING. Every variable should be defined as a constant before.
Example:

<CONSTANT Name="C_ACCTP_KEY" Type="Integer"/>
<LOOP><VAR Name="C_ACCTP_KEY">101</VAR></LOOP>
<LOOP><VAR Name="C_ACCTP_KEY">201</VAR></LOOP>
<LOOP><VAR Name="C_ACCTP_KEY">301</VAR></LOOP>
<LOOP><VAR Name="C_ACCTP_KEY">501</VAR></LOOP>

Example for two variables:

<CONSTANT Name="C_ACCTP_KEY" Type="Integer"/>
<CONSTANT Name="C_PTORI_KEY" Type="Integer"/>
<LOOP><VAR Name="C_ACCTP_KEY">101</VAR><VAR Name="C_PTORI_KEY">1</VAR></LOOP>
<LOOP><VAR Name="C_ACCTP_KEY">201</VAR><VAR Name="C_PTORI_KEY">2</VAR></LOOP>
<LOOP><VAR Name="C_ACCTP_KEY">301</VAR><VAR Name="C_PTORI_KEY">3</VAR></LOOP>
<LOOP><VAR Name="C_ACCTP_KEY">501</VAR><VAR Name="C_PTORI_KEY">10</VAR></LOOP>

Attributes:

AttributeTypeDescription
NameMandatoryName of constant.

Ludek Bob Jankovsky
All Right Reserved © 2007, Designed by Bob Jankovsky