HomeArticlesSlideshowsAsk BobLogin or Register
NewsCustom SearchContactLinks
Domain analysis
ORACLE STUFFS
09-Aug-2008
Domain analysis is a task of data profiling allowing gather automatically some kind of statistical metadata of either data in their original database schema or data samples loaded into an Oracle database.

Simple module gathers data into own metadata tables. Creation of tables is managed from module as an initial step to simplyfy instalation of the module. There are two basic aims of usage of domain analysis:

  • Utilize time during bisiness analysis to gather automatically usefull information about data in current systems to remove gaps and mistakes in documentation abot them.
  • Use it as part of factory tests to catch some kind of issues in data
Following steps are required to install domain analysys:
  • Prepare database schema where you will install the BW_DOMAIN_ANALYSIS package (and metadata tables - they will be created by package). Following privileges are required:
    • Create procedure
    • Create table
    • Create sequence
    • Create trigger
    • Privileges to be able select data from analyzed structures.
  • Run the downloaded script to create the package.
  • Create metadata dictionary executing BW_DOMAIN_ANALYSIS.CREATEREPOSITORY method.

Script: domain_analysis.sql

CREATE OR REPLACE Package BW_DOMAIN_ANALYSIS as
------------------------------------------------------------
-- Domain analysis module 
-- CopyLeft, 2007, 2008, Bob Jankovsky
-- 20070814 Bob Jankovsky New module
-- 20081014 Bob Jankovsky Failover enhanced, remote analysis accelerated
------------------------------------------------------------
  Procedure CreateRepository( -- Creates repository for the domain analysis data
     P_Tablespace varchar2:=null  --tablespace of repository
    ,P_Index_tablespace varchar2:=null --tablespace of indexes

...more

Syntax for creating repository:

BEGIN 
  BW_DOMAIN_ANALYSIS.CREATEREPOSITORY('<data tablespace>','<index tablespace>');
END; 

Now you have to setup an analysis schema. Following example defines analysis of all tables in the DATA_OWNER schema excluding tables of the domain analysis repository. Module takes all tables of spcecified schema excluding tables, starting with 'BWDA', and computes domain analysis for them.

Insert into BWDA_SCH(SCH_NAME,SCH_OWNER,SCH_NOTLIKE_MASK,SCH_DESC)
  values('EXAMPLE1','DATA_OWNER','BWDA%','Example analysis');
Begin
  BW_DOMAIN_ANALYSIS.DOMAINANALYSIS('EXAMPLE1');
End;
/

Later I will describe other possibilities how to define schema, for example, how to list particular tables and partitions for analysis.

Repository
Following schema shows relationships between metadata repository tables.

BWDA_SCH Domain analysis schema
SCH_KEY NUMBER 1 Surrogate primary key of the BWDA_SCH table
SCH_NAME VARCHAR2(100) O Schema name
SCH_NOTLIKE_MASK VARCHAR2(100) O Schema dynamically gathered tables not like condition
SCH_OWNER VARCHAR2(100) O Schema default owner
SCH_ENUM_FLAG VARCHAR2(1) O Schema based on enumeration of tables (not gathered automatically)
SCH_LIKE_MASK VARCHAR2(100) O Schema dynamically gathered tables like condition
SCH_LAST_ANALYSIS DATE O Schema last analysis date
SCH_DESC VARCHAR2(2000) O Schema description description
SCH_DB_LINK VARCHAR2(30) O Schema default database link
BWDA_TAB Anayzed table
TAB_KEY NUMBER 1 Surrogate primary key of the BWDA_TAB table
TAB_PAR_KEY NUMBER O Parent table foreign key
TAB_REC_COUNT NUMBER O Number of records
TAB_DB_LINK VARCHAR2(30) O Table db-link if remote
SCH_KEY NUMBER O Schema foreign key
TAB_OWNER VARCHAR2(30) O Table owner
TAB_PARTITIONING_CONDITION VARCHAR2(2000) O Condition overriding partition predicate (for linked tables)
TAB_DESC VARCHAR2(2000) O Table description
TAB_NAME VARCHAR2(30) O Table name
TAB_CHILD_COUNT NUMBER O Number of child tables
TAB_PARTITION VARCHAR2(30) O Table partition or partition mask
TAB_LAST_ANALYSIS DATE O Table last analysis date
BWDA_COL Anayzed column
COL_KEY NUMBER 1 Surrogate primary key of the BWDA_COL table
COL_DEFAULT VARCHAR2(2000) O Column default
COL_MANDATORY VARCHAR2(1) O Mandatory flag
COL_DESC VARCHAR2(2000) O Column description
COL_VAL_COUNT NUMBER O Number of filled values
COL_LAST_ANALYSIS DATE O Column last analysis date
COL_DATA_LENGTH NUMBER O Data type - length
COL_DATA_SCALE NUMBER O Data type - scale
COL_DATA_PRECISION NUMBER O Data type - precision
COL_DATA_TYPE VARCHAR2(30) O Data type
COL_PK NUMBER O Primary key order
COL_CARDINALITY NUMBER O Number distinct values
TAB_KEY NUMBER O Table foreign key
COL_NAME VARCHAR2(30) O Column name
BWDA_COLFACT Anayzed column facts
COLFACT_KEY NUMBER 1 Surrogate primary key of the BWDA_COLFACT table
COLFACT_BY VARCHAR2(30) O value/occurance
COLFACT_COUNT NUMBER O Number of the values
COL_KEY NUMBER O Column foreign key
COLFACT_ORDER NUMBER O Order of the value
COLFACT_VALUE VARCHAR2(2000) O Value
COLFACT_DIRECTION VARCHAR2(30) O Top, Bottom, Max
COLFACT_METRIC VARCHAR2(30) O Measured value type
BWDA_PAR Domain analysis parameters
PAR_KEY NUMBER 1 Surrogate primary key of the BWDA_PAR table
PAR_TYPE VARCHAR2(100) O Parameter type
PAR_NAME VARCHAR2(100) O Parameter name
PAR_VALUE VARCHAR2(2000) O Parameter value
PAR_DATA_TYPE_MASK VARCHAR2(500) O Data type mask

Following example query allows you get partial result of domain analysis into simple tabular form.

Script: da_select01.sql

Select T.TAB_NAME table_name
 ,C.COL_NAME      column_name
 ,case when PT.TAB_CHILD_COUNT=1 then 'all_partitions(1)' 
       when T.TAB_CHILD_COUNT=0 then NVL(T.TAB_PARTITION,'not partitioned') 
       else 'all_partitions' 
  end  partition_name
 ,T.TAB_REC_COUNT record_count
 ,C.COL_CARDINALITY distinct_values
 ,to_char(round(C.COL_CARDINALITY/nullif(T.TAB_REC_COUNT,0)*100,2),'990D00')||'%' cardinality
 ,FNNULL.COLFACT_COUNT no_default_count

...more


Or more complex:

Script: DA_Select_02.sql

Select T.TAB_NAME table_name
 ,case when PT.TAB_CHILD_COUNT=1 then 'all_partitions(1)' 
       when T.TAB_CHILD_COUNT=0 then NVL(T.TAB_PARTITION,'not partitioned') 
       else 'all_partitions' 
  end  partition_name
 ,C.COL_NAME      column_name
 ,T.TAB_REC_COUNT record_count
 ,C.COL_CARDINALITY distinct_values
 ,to_char(round(C.COL_CARDINALITY/nullif(T.TAB_REC_COUNT,0)*100,2),'990D00')||'%' cardinality
 ,FNNULL.COLFACT_COUNT no_default_count

...more
Ludek Bob Jankovsky
All Right Reserved © 2007, Designed by Bob Jankovsky