HomeArticlesSlideshowsAsk BobLogin or Register
NewsCustom SearchContactLinks
Null-ability vs. rigid not null strategies in BI solutions
DATA ARCHITECTURE
19-Jan-2009
Once several Data warehousing methodologists proposed rigid not-null approach to store data. All the null values has been substituted by not null substitution values despite it is reasonable or not. Like all rigid rules there are both advantages and disadvantages of such approach. Lets see them.


Advantages of not null strategy:
  • Easier comparison of not null values.
  • Explicitly defined appearance of null attributes.
  • Possibility to define mode "undefined" values.
Disadvantages of not null strategy:
  • Hardships with definition of substitution value for particular domains (amounts...).
  • Not flexible appearance of null attributes (e.g. national specific).
  • Wrong result of standard aggregation functions.
  • Space consumption - unable to place rarely filled fields to end of list.

More detail description of not null strategy with particular domains follows.

Data domain type Example "Not available" substitution value "Error" substitution value NOT NULL usage
Identifiers COUNTRY_ID XNA XER Reasonable
Identifiers as foreign keys COUNTRY_ID XNA XER Reasonable if they associates synthetic records
Surrogate keys COUNTRY_KEY, COUNTRY_SEQ -1 -2 Reasonable
Surrogate foreign keys COUNTRY_KEY, COUNTRY_SEQ -1 -2 Reasonable if they associates synthetic records
Flags (Y/N) CURRENT_FLG X# May be
Flags (1/0) CURRENT_FLG -1 -2 May be
Codes CURRENCY_CODE XNA XER May be
Names FIRST_NAME XNA XER Better no
Lower limit date VALID_FROM 01-JAN-1000 Reasonable
Upper limit date VALID_THRU 31-DEC-2999 Reasonable
Date BIRTH_DATE 01-JAN-1000 Better no
Amounts, Counts LEDGER_BAL_AMT 0 Never


Identifiers (ID)

If they are used as natural keys, it is reasonable to use value like XNA for "undefined" synthetic record and XER for error synthetic record. It looks better to use XNA even with compound natural keys.
If they are used as foreign keys, it is reasonable to use XNA/XER to reference synthetic records.

Result: reasonable


Surrogate keys (KEY, SEQ)

It is reasonable to use value like -1 for "undefined" synthetic record and -2 for error synthetic record.
There is possible to use more synthetic errors for various maintained states.

Result: reasonable


Foreign keys (KEY, SEQ)

It is reasonable to use -1/-2 to reference synthetic records.

Result: reasonable


Flags (FLAG, FLG)

They are two versions of flag domain realization:
  • Y / N ... possible substitution value - X
  • 1 / 0 ... possible substitution value - -1
In this case it is possible to keep null value. In the case it is easier to manage to consider null values as Yes or as No. Null value behaves in most evaluations standardly.

Result: may be


Codes (CODE)

XNA / XER values are used standardly.
In this case it is possible to keep null value.

Result: may be


Names (NAME, NM)

XNA / XER values are used standardly. N/A could be used too, but it is not reasonable to use different substitutions for such classes.
In this case it is better to keep null value.

Result: keep away from reach of children


Dates (DATE, DT, DTM)

There are following variants of date values:
  • Standard date - just a date, it is not a lower or upper limit. In that case it is better to use null. Then you can decide how to maintain null values during comparison.
  • Upper date limit - it is cozy to use constant - Max date. (e.g. 31-DEC-2999).
  • Lower date limit - it is cozy to use constant - Min date. (e.g. 01-JAN-1000).

Result: reasonable for limit values


Amounts, Counts (AMT, CNT)

Sometimes 0 is used, anyway it is really not a value what could not be missed with real value. So it is reasonable allow null values.
In this case it is better to keep null value.

Result: nonsense

Ludek Bob Jankovsky
All Right Reserved © 2007, Designed by Bob Jankovsky