HomeArticlesSlideshowsAsk BobLogin or Register
NewsCustom SearchContactLinks
DIFF_MERGE - Differential merge ETL transformation pattern
ETL PATTERNS
28-Aug-2008
ETL Pattern means standard functional task defining what has to happen despite technical implementation.
DIFF_MERGE pattern task:
  • inserts new elements
  • updates changed existing elements
  • ignores unchanged existing elements

Some solutions uses MERGE pattern as standard instead of mentioned DIFF_MERGE, anyway there are following disadvantages:

  • Plenty of uneffective write operations and consequent logging.
  • Update of unchanged data makes audit date of last modification of record useless.
Therefore, usage of DIFF_MERGE pattern instead of clumsy MERGE one is an effective way.

Examples in the article have been built using Simple ETL module generator.

See: Simple ETL module generator

Metadata example:

<?xml version="1.0"?>
<MODULE Pattern="DIFF_MERGE" TargetOwner="A2" TargetTable="PARTY">
<EXPR Name="N2CH">to_char(:1)</EXPR> 
<SRCTAB Owner="A2" Name="CUSTOMER" Alias="CUST" Type="FROM" Link="" />
<COLMAP TargetColumn="PT_SEQ" DirectFlag="Y" UpdateFlag="N">PT_SEQ.nextval</COLMAP>
<COLMAP TargetColumn="PT_SID" MatchFlag="Y">'SR1'</COLMAP>
<COLMAP TargetColumn="PT_ID" SourceAlias="CUST" SourceColumn="ID" Expr="N2CH" MatchFlag="Y" />
<COLMAP TargetColumn="PT_FIRST_NAME" SourceAlias="CUST" SourceColumn="CUST_FIRST_NAME" />
<COLMAP TargetColumn="PT_LAST_NAME" SourceAlias="CUST" SourceColumn="CUST_LAST_NAME" />
<COLMAP TargetColumn="PT_BIRTH_DATE" SourceAlias="CUST" SourceColumn="CUST_BIRTH_DATE" />
<COLMAP TargetColumn="PT_CODE" SourceAlias="CUST" SourceColumn="CUST_CODE" />
<COLMAP TargetColumn="PT_DELETED_FLAG" SourceAlias="CUST" SourceColumn="CUST_DELETED_FLAG" />
<COLMAP TargetColumn="UPD_EFF_DATE" UpdateFlag="Y">p_effective_date</COLMAP>
<COLMAP TargetColumn="INS_DT" UpdateFlag="N">sysdate</COLMAP>
<COLMAP TargetColumn="UPD_DT" UpdateFlag="Y">sysdate</COLMAP>
<COLMAP TargetColumn="INS_PROCESS" UpdateFlag="N">p_process_key</COLMAP>
<COLMAP TargetColumn="UPD_PROCESS" UpdateFlag="Y">p_process_key</COLMAP>
</MODULE>

Generated source code example (technical implementation for Oracle 10g+ allows to use simplier algorithm analysing changes in WHEN MATCHED / WHERE clause of the MERGE statement. In Oracle 9i MINUS operation in source select used to be used.):

MERGE INTO A2.PARTY TRG$$
USING(
  SELECT
      'SR1' as PT_SID
     ,to_char(CUST.ID) as PT_ID
     ,CUST.CUST_FIRST_NAME as PT_FIRST_NAME
     ,CUST.CUST_LAST_NAME as PT_LAST_NAME
     ,CUST.CUST_BIRTH_DATE as PT_BIRTH_DATE
     ,CUST.CUST_CODE as PT_CODE
     ,CUST.CUST_DELETED_FLAG as PT_DELETED_FLAG
     ,p_effective_date as UPD_EFF_DATE
     ,sysdate as INS_DT
     ,sysdate as UPD_DT
     ,p_process_key as INS_PROCESS
     ,p_process_key as UPD_PROCESS
  FROM A2.CUSTOMER CUST
) SRC$$
ON(
  TRG$$.PT_SID=SRC$$.PT_SID
  AND TRG$$.PT_ID=SRC$$.PT_ID
)
WHEN MATCHED THEN UPDATE SET
   TRG$$.PT_FIRST_NAME=SRC$$.PT_FIRST_NAME
  ,TRG$$.PT_LAST_NAME=SRC$$.PT_LAST_NAME
  ,TRG$$.PT_BIRTH_DATE=SRC$$.PT_BIRTH_DATE
  ,TRG$$.PT_CODE=SRC$$.PT_CODE
  ,TRG$$.PT_DELETED_FLAG=SRC$$.PT_DELETED_FLAG
  ,TRG$$.UPD_EFF_DATE=SRC$$.UPD_EFF_DATE
  ,TRG$$.UPD_DT=SRC$$.UPD_DT
  ,TRG$$.UPD_PROCESS=SRC$$.UPD_PROCESS
  WHERE (TRG$$.PT_FIRST_NAME!=SRC$$.PT_FIRST_NAME
  OR TRG$$.PT_LAST_NAME!=SRC$$.PT_LAST_NAME
  OR TRG$$.PT_BIRTH_DATE!=SRC$$.PT_BIRTH_DATE
  OR TRG$$.PT_CODE!=SRC$$.PT_CODE
  OR TRG$$.PT_DELETED_FLAG!=SRC$$.PT_DELETED_FLAG
  OR TRG$$.UPD_EFF_DATE!=SRC$$.UPD_EFF_DATE
  OR TRG$$.UPD_DT!=SRC$$.UPD_DT
  OR TRG$$.UPD_PROCESS!=SRC$$.UPD_PROCESS
  )--/differential part
WHEN NOT MATCHED THEN INSERT
  (
     TRG$$.PT_SEQ
    ,TRG$$.PT_SID
    ,TRG$$.PT_ID
    ,TRG$$.PT_FIRST_NAME
    ,TRG$$.PT_LAST_NAME
    ,TRG$$.PT_BIRTH_DATE
    ,TRG$$.PT_CODE
    ,TRG$$.PT_DELETED_FLAG
    ,TRG$$.UPD_EFF_DATE
    ,TRG$$.INS_DT
    ,TRG$$.UPD_DT
    ,TRG$$.INS_PROCESS
    ,TRG$$.UPD_PROCESS
   )VALUES(
     PT_SEQ.nextval
    ,SRC$$.PT_SID
    ,SRC$$.PT_ID
    ,SRC$$.PT_FIRST_NAME
    ,SRC$$.PT_LAST_NAME
    ,SRC$$.PT_BIRTH_DATE
    ,SRC$$.PT_CODE
    ,SRC$$.PT_DELETED_FLAG
    ,SRC$$.UPD_EFF_DATE
    ,SRC$$.INS_DT
    ,SRC$$.UPD_DT
    ,SRC$$.INS_PROCESS
    ,SRC$$.UPD_PROCESS
  )
;

Following additional attributes of column mappings are important for DIFF_MERGE pattern:

  • Match flag - specifies matching key of transformation.
  • Update flag - allows suppress of later updating of specified columns.
  • Hash flag - allows exclude columns from evaluation, if record is changed.
  • Nullable flag - could be important for construction of comparison clause.

Ludek Bob Jankovsky
All Right Reserved © 2007, Designed by Bob Jankovsky