HomeArticlesSlideshowsAsk BobLogin or Register
NewsCustom SearchContactLinks
History in data
DATA ARCHITECTURE
02-Sep-2008
Most of our data is changing in time. Sometimes we need project that changes into our data. There are several ways how to realize that.


No history (Current data only) The method could be used either when data are mostly constant or when history of changes is not as important to solve it.


Natural history Some data are naturally tied to particular date. It is usual for logs and transaction tables. (It is a bit more complex for transaction because there are more than one date value - VALUE_DATE, POSTING_DATE - etc.) Data are stored like snapshot history. It is important to solve ageing of such data at least on the partitioning level.


Band history The most common way to store history of slowly changing data such as Slowly changing dimensions (teherfore SCD2). Each record contains VALID_FROM date and VALID_THRU date. Each element is represented by one or more records describing its history.
Band history in an original table
Simplier method does not allow create stable key of elements. Each record of element has its specific key so it is harder for former integration.
Band history extracted into history table
Entity contains two tables. The first one stores current state only like tables without history. The second one stores all records, both current and historical with addintional columns containing VALID_FROM date and VALID_THRU date.

See: CREEPING_DEATH algorithm ETL Transformation pattern
See: Ageing of data with band validity (SCD, History tables)


Horizontal history This specific method is useful to store historycal values related to some specific milestones. There are specific columns assigned to store values for each milestone, such as:
  • Opening_amount
  • Current_amount
etc... It is just a question if we have to consider such values as history:)


Logged history This specific method is useful to store historical values with minimum requirements for space such as version control systems. It is usefull when accessibility to historical values is not a major requirement. Log table contains just historical data, each record contains information about change of one column.


Snapshot history (Fact tables) The most common method used for fact tables. Each record contains SNAP_DATE - date of snapshot. SNAP_DATE can be used for partitioning to solve ageing of data.

Ludek Bob Jankovsky
All Right Reserved © 2007, Designed by Bob Jankovsky