HomeArticlesSlideshowsAsk BobLogin or Register
NewsCustom SearchContactLinks
INSERT ETL transformation patterns family
ETL PATTERNS
28-Aug-2008
ETL Pattern means standard functional task defining what has to happen despite technical implementation.
Basic insert-wise patterns:
  • INSERT - just inserts records from source (simpliest)
  • DIFF_INSERT - inserts new records only, existing records (based on matching key) are ignored
  • SCD2_INSERT - inserts new and changed records. Changed records are inserted as duplicates. That way supports load of SCD2 (slowly changing dimensions).

Examples in the article have been built using Simple ETL module generator. SCD2_INSERT pattern is not directly supported by the generator. Worth workaround is usage of DIFF_INSERT pattern with Matchig flag over all meaning columns.

See: Simple ETL module generator

Metadata example (the same metadata will be reused for INSERT and DIFF_INSERT patterns):

<?xml version="1.0"?>
<MODULE Pattern="INSERT" TargetOwner="A2" TargetTable="PARTY">
<EXPR Name="N2CH">to_char(:1)</EXPR> 
<SRCTAB Owner="A2" Name="CUSTOMER" Alias="CUST" Type="FROM" Link="" />
<COLMAP TargetColumn="PT_SEQ" DirectFlag="Y" UpdateFlag="N">PT_SEQ.nextval</COLMAP>
<COLMAP TargetColumn="PT_SID" MatchFlag="Y">'SR1'</COLMAP>
<COLMAP TargetColumn="PT_ID" SourceAlias="CUST" SourceColumn="ID" Expr="N2CH" MatchFlag="Y" />
<COLMAP TargetColumn="PT_FIRST_NAME" SourceAlias="CUST" SourceColumn="CUST_FIRST_NAME" />
<COLMAP TargetColumn="PT_LAST_NAME" SourceAlias="CUST" SourceColumn="CUST_LAST_NAME" />
<COLMAP TargetColumn="PT_BIRTH_DATE" SourceAlias="CUST" SourceColumn="CUST_BIRTH_DATE" />
<COLMAP TargetColumn="PT_CODE" SourceAlias="CUST" SourceColumn="CUST_CODE" />
<COLMAP TargetColumn="PT_DELETED_FLAG" SourceAlias="CUST" SourceColumn="CUST_DELETED_FLAG" />
<COLMAP TargetColumn="UPD_EFF_DATE" Updateflag="Y">p_effective_date</COLMAP>
<COLMAP TargetColumn="INS_DT" Updateflag="N">sysdate</COLMAP>
<COLMAP TargetColumn="UPD_DT" Updateflag="Y">sysdate</COLMAP>
<COLMAP TargetColumn="INS_PROCESS" Updateflag="N">p_process_key</COLMAP>
<COLMAP TargetColumn="UPD_PROCESS" Updateflag="Y">p_process_key</COLMAP>
</MODULE>

Simple INSERT pattern implementation:

INSERT INTO A2.PARTY TRG$$
  (
     TRG$$.PT_SEQ
    ,TRG$$.PT_SID
    ,TRG$$.PT_ID
    ,TRG$$.PT_FIRST_NAME
    ,TRG$$.PT_LAST_NAME
    ,TRG$$.PT_BIRTH_DATE
    ,TRG$$.PT_CODE
    ,TRG$$.PT_DELETED_FLAG
    ,TRG$$.UPD_EFF_DATE
    ,TRG$$.INS_DT
    ,TRG$$.UPD_DT
    ,TRG$$.INS_PROCESS
    ,TRG$$.UPD_PROCESS
   )
   Select
     PT_SEQ.nextval
    ,SRC$$.PT_SID
    ,SRC$$.PT_ID
    ,SRC$$.PT_FIRST_NAME
    ,SRC$$.PT_LAST_NAME
    ,SRC$$.PT_BIRTH_DATE
    ,SRC$$.PT_CODE
    ,SRC$$.PT_DELETED_FLAG
    ,SRC$$.UPD_EFF_DATE
    ,SRC$$.INS_DT
    ,SRC$$.UPD_DT
    ,SRC$$.INS_PROCESS
    ,SRC$$.UPD_PROCESS
   from(
     SELECT
         'SR1' as PT_SID
        ,to_char(CUST.ID) as PT_ID
        ,CUST.CUST_FIRST_NAME as PT_FIRST_NAME
        ,CUST.CUST_LAST_NAME as PT_LAST_NAME
        ,CUST.CUST_BIRTH_DATE as PT_BIRTH_DATE
        ,CUST.CUST_CODE as PT_CODE
        ,CUST.CUST_DELETED_FLAG as PT_DELETED_FLAG
        ,p_effective_date as UPD_EFF_DATE
        ,sysdate as INS_DT
        ,sysdate as UPD_DT
        ,p_process_key as INS_PROCESS
        ,p_process_key as UPD_PROCESS
     FROM A2.CUSTOMER CUST
   ) SRC$$
;

Implementation of DIFF_INSERT pattern based on the same metadata:

MERGE INTO A2.PARTY TRG$$
USING(
  SELECT
      'SR1' as PT_SID
     ,to_char(CUST.ID) as PT_ID
     ,CUST.CUST_FIRST_NAME as PT_FIRST_NAME
     ,CUST.CUST_LAST_NAME as PT_LAST_NAME
     ,CUST.CUST_BIRTH_DATE as PT_BIRTH_DATE
     ,CUST.CUST_CODE as PT_CODE
     ,CUST.CUST_DELETED_FLAG as PT_DELETED_FLAG
     ,p_effective_date as UPD_EFF_DATE
     ,sysdate as INS_DT
     ,sysdate as UPD_DT
     ,p_process_key as INS_PROCESS
     ,p_process_key as UPD_PROCESS
  FROM A2.CUSTOMER CUST
) SRC$$
ON(
  TRG$$.PT_SID=SRC$$.PT_SID
  AND TRG$$.PT_ID=SRC$$.PT_ID
)
WHEN NOT MATCHED THEN INSERT
  (
     TRG$$.PT_SEQ
    ,TRG$$.PT_SID
    ,TRG$$.PT_ID
    ,TRG$$.PT_FIRST_NAME
    ,TRG$$.PT_LAST_NAME
    ,TRG$$.PT_BIRTH_DATE
    ,TRG$$.PT_CODE
    ,TRG$$.PT_DELETED_FLAG
    ,TRG$$.UPD_EFF_DATE
    ,TRG$$.INS_DT
    ,TRG$$.UPD_DT
    ,TRG$$.INS_PROCESS
    ,TRG$$.UPD_PROCESS
   )VALUES(
     PT_SEQ.nextval
    ,SRC$$.PT_SID
    ,SRC$$.PT_ID
    ,SRC$$.PT_FIRST_NAME
    ,SRC$$.PT_LAST_NAME
    ,SRC$$.PT_BIRTH_DATE
    ,SRC$$.PT_CODE
    ,SRC$$.PT_DELETED_FLAG
    ,SRC$$.UPD_EFF_DATE
    ,SRC$$.INS_DT
    ,SRC$$.UPD_DT
    ,SRC$$.INS_PROCESS
    ,SRC$$.UPD_PROCESS
  )
;

SCD2_INSERT pattern requires in our case modification of metadata.
Metadata of the workaroun - mention MatchFlag="Y" for all meaningful columns:

<?xml version="1.0"?>
<MODULE Pattern="DIFF_INSERT" TargetOwner="A2" TargetTable="PARTY">
<EXPR Name="N2CH">to_char(:1)</EXPR> 
<SRCTAB Owner="A2" Name="CUSTOMER" Alias="CUST" Type="FROM" Link="" />
<COLMAP TargetColumn="PT_SEQ" DirectFlag="Y" UpdateFlag="N">PT_SEQ.nextval</COLMAP>
<COLMAP TargetColumn="PT_SID" MatchFlag="Y">'SR1'</COLMAP>
<COLMAP TargetColumn="PT_ID" SourceAlias="CUST" SourceColumn="ID" Expr="N2CH" MatchFlag="Y" />
<COLMAP TargetColumn="PT_FIRST_NAME" SourceAlias="CUST" SourceColumn="CUST_FIRST_NAME" MatchFlag="Y" />
<COLMAP TargetColumn="PT_LAST_NAME" SourceAlias="CUST" SourceColumn="CUST_LAST_NAME" MatchFlag="Y" />
<COLMAP TargetColumn="PT_BIRTH_DATE" SourceAlias="CUST" SourceColumn="CUST_BIRTH_DATE" MatchFlag="Y" />
<COLMAP TargetColumn="PT_CODE" SourceAlias="CUST" SourceColumn="CUST_CODE" MatchFlag="Y" />
<COLMAP TargetColumn="PT_DELETED_FLAG" SourceAlias="CUST" SourceColumn="CUST_DELETED_FLAG" MatchFlag="Y" />
<COLMAP TargetColumn="VALID_FROM" MatchFlag="N">p_effective_date</COLMAP>
<COLMAP TargetColumn="VALID_THRU" MatchFlag="N">to_date('29991231')</COLMAP>
<COLMAP TargetColumn="UPD_EFF_DATE" UpdateFlag="Y">p_effective_date</COLMAP>
<COLMAP TargetColumn="INS_DT" UpdateFlag="N">sysdate</COLMAP>
<COLMAP TargetColumn="UPD_DT" UpdateFlag="Y">sysdate</COLMAP>
<COLMAP TargetColumn="INS_PROCESS" UpdateFlag="N">p_process_key</COLMAP>
<COLMAP TargetColumn="UPD_PROCESS" UpdateFlag="Y">p_process_key</COLMAP>
</MODULE>

Implementation of the SCD2_INSERT pattern (mention form of the merge usage):

MERGE INTO A2.PARTY TRG$$
USING(
  SELECT
      'SR1' as PT_SID
     ,to_char(CUST.ID) as PT_ID
     ,CUST.CUST_FIRST_NAME as PT_FIRST_NAME
     ,CUST.CUST_LAST_NAME as PT_LAST_NAME
     ,CUST.CUST_BIRTH_DATE as PT_BIRTH_DATE
     ,CUST.CUST_CODE as PT_CODE
     ,CUST.CUST_DELETED_FLAG as PT_DELETED_FLAG
     ,p_effective_date as VALID_FROM
     ,to_date('29991231') as VALID_THRU
     ,p_effective_date as UPD_EFF_DATE
     ,sysdate as INS_DT
     ,sysdate as UPD_DT
     ,p_process_key as INS_PROCESS
     ,p_process_key as UPD_PROCESS
  FROM A2.CUSTOMER CUST
) SRC$$
ON(
  TRG$$.PT_SID=SRC$$.PT_SID
  AND TRG$$.PT_ID=SRC$$.PT_ID
  AND TRG$$.PT_FIRST_NAME=SRC$$.PT_FIRST_NAME
  AND TRG$$.PT_LAST_NAME=SRC$$.PT_LAST_NAME
  AND TRG$$.PT_BIRTH_DATE=SRC$$.PT_BIRTH_DATE
  AND TRG$$.PT_CODE=SRC$$.PT_CODE
  AND TRG$$.PT_DELETED_FLAG=SRC$$.PT_DELETED_FLAG
)
WHEN NOT MATCHED THEN INSERT
  (
     TRG$$.PT_SEQ
    ,TRG$$.PT_SID
    ,TRG$$.PT_ID
    ,TRG$$.PT_FIRST_NAME
    ,TRG$$.PT_LAST_NAME
    ,TRG$$.PT_BIRTH_DATE
    ,TRG$$.PT_CODE
    ,TRG$$.PT_DELETED_FLAG
    ,TRG$$.VALID_FROM
    ,TRG$$.VALID_THRU
    ,TRG$$.UPD_EFF_DATE
    ,TRG$$.INS_DT
    ,TRG$$.UPD_DT
    ,TRG$$.INS_PROCESS
    ,TRG$$.UPD_PROCESS
   )VALUES(
     PT_SEQ.nextval
    ,SRC$$.PT_SID
    ,SRC$$.PT_ID
    ,SRC$$.PT_FIRST_NAME
    ,SRC$$.PT_LAST_NAME
    ,SRC$$.PT_BIRTH_DATE
    ,SRC$$.PT_CODE
    ,SRC$$.PT_DELETED_FLAG
    ,SRC$$.VALID_FROM
    ,SRC$$.VALID_THRU
    ,SRC$$.UPD_EFF_DATE
    ,SRC$$.INS_DT
    ,SRC$$.UPD_DT
    ,SRC$$.INS_PROCESS
    ,SRC$$.UPD_PROCESS
  )
;

Just another technical algorithm of DIFF_INSERT - DIFF_INSERT(MINUS). It's more efficient with transformation thru DB-link. The algorithm supports DirectFlag option for all non-meaningful columns.

<?xml version="1.0"?>
<MODULE Pattern="DIFF_INSERT(MINUS)" TargetOwner="A2" TargetTable="PARTY">
<EXPR Name="N2CH">to_char(:1)</EXPR> 
<SRCTAB Owner="A2" Name="CUSTOMER" Alias="CUST" Type="FROM" Link="" />
<COLMAP TargetColumn="PT_SEQ" DirectFlag="Y" UpdateFlag="N">PT_SEQ.nextval</COLMAP>
<COLMAP TargetColumn="PT_SID" MatchFlag="Y">'SR1'</COLMAP>
<COLMAP TargetColumn="PT_ID" SourceAlias="CUST" SourceColumn="ID" Expr="N2CH" MatchFlag="Y" />
<COLMAP TargetColumn="PT_FIRST_NAME" SourceAlias="CUST" SourceColumn="CUST_FIRST_NAME" />
<COLMAP TargetColumn="PT_LAST_NAME" SourceAlias="CUST" SourceColumn="CUST_LAST_NAME" />
<COLMAP TargetColumn="PT_BIRTH_DATE" SourceAlias="CUST" SourceColumn="CUST_BIRTH_DATE" />
<COLMAP TargetColumn="PT_CODE" SourceAlias="CUST" SourceColumn="CUST_CODE" />
<COLMAP TargetColumn="PT_DELETED_FLAG" SourceAlias="CUST" SourceColumn="CUST_DELETED_FLAG" />
<COLMAP TargetColumn="UPD_EFF_DATE" DirectFlag="Y" UpdateFlag="Y">p_effective_date</COLMAP>
<COLMAP TargetColumn="INS_DT" DirectFlag="Y" UpdateFlag="N">sysdate</COLMAP>
<COLMAP TargetColumn="UPD_DT" DirectFlag="Y" UpdateFlag="Y">sysdate</COLMAP>
<COLMAP TargetColumn="INS_PROCESS" DirectFlag="Y" UpdateFlag="N">p_process_key</COLMAP>
<COLMAP TargetColumn="UPD_PROCESS" DirectFlag="Y" UpdateFlag="Y">p_process_key</COLMAP>
</MODULE>

Implementation:

INSERT INTO A2.PARTY TRG$$
  (
     TRG$$.PT_SEQ
    ,TRG$$.PT_SID
    ,TRG$$.PT_ID
    ,TRG$$.PT_FIRST_NAME
    ,TRG$$.PT_LAST_NAME
    ,TRG$$.PT_BIRTH_DATE
    ,TRG$$.PT_CODE
    ,TRG$$.PT_DELETED_FLAG
    ,TRG$$.UPD_EFF_DATE
    ,TRG$$.INS_DT
    ,TRG$$.UPD_DT
    ,TRG$$.INS_PROCESS
    ,TRG$$.UPD_PROCESS
   )
   Select
     PT_SEQ.nextval
    ,SRC$$.PT_SID
    ,SRC$$.PT_ID
    ,SRC$$.PT_FIRST_NAME
    ,SRC$$.PT_LAST_NAME
    ,SRC$$.PT_BIRTH_DATE
    ,SRC$$.PT_CODE
    ,SRC$$.PT_DELETED_FLAG
    ,p_effective_date
    ,sysdate
    ,sysdate
    ,p_process_key
    ,p_process_key
   from(
     SELECT
         'SR1' as PT_SID
        ,to_char(CUST.ID) as PT_ID
        ,CUST.CUST_FIRST_NAME as PT_FIRST_NAME
        ,CUST.CUST_LAST_NAME as PT_LAST_NAME
        ,CUST.CUST_BIRTH_DATE as PT_BIRTH_DATE
        ,CUST.CUST_CODE as PT_CODE
        ,CUST.CUST_DELETED_FLAG as PT_DELETED_FLAG
     FROM A2.CUSTOMER CUST
     MINUS
       SELECT
        TRG$$.PT_SID
       ,TRG$$.PT_ID
       ,TRG$$.PT_FIRST_NAME
       ,TRG$$.PT_LAST_NAME
       ,TRG$$.PT_BIRTH_DATE
       ,TRG$$.PT_CODE
       ,TRG$$.PT_DELETED_FLAG
       FROM A2.PARTY TRG$$
   ) SRC$$
;

Ludek Bob Jankovsky
All Right Reserved © 2007, Designed by Bob Jankovsky