HomeArticlesSlideshowsAsk BobLogin or Register
NewsCustom SearchContactLinks
ETL process scheduling - failover power
HEAP WORKFLOW
06-Nov-2008
There are various systems and methods to manage ETL processes. Following article describes failover capability of particular methods.

We can evaluate systems of management of ETL processes from various points of view as it is mentioned in article:

See: ETL PROCESS SCHEDULING - SAMPLE SOLUTION FOR ORACLE

One of these points of view is the failover area - behavior of system after one or more processes fail. Following basic elements of system could help in the failover area:


  • Repeatedly executable modules - all modules should be ether abele to run repeatedly or there is necessary "rollback" process to get to previous state.
  • Savepoint of realized modules - in the case of failures it is possible to continue from the last correct point.
  • Failover parallelism - failure of one module do not break execution of other modules independent on it.
  • Early warning checks - problems have to be resolvable before an execution of modules.


Failover parallelism
Differently from standard process parallelism failover parallelism works in cases of failures of particular processes.
Following example will be used to explain meaning of the difference. There are several processes with mutual dependencies disallowing parallel execution of any processes in standard state.

In standard state of all the processes it could be serialized into firm sequence of processes without any complex process management.

So what is wrong?
Difference appears when some of processes fails.
Lets assume process Nr.2 fails. In the case there are no savepoints all the process should be restarted after the 2nd process issue is solved. In the case of savepoints 6 processes should be restarted.

In the case of fain grain managed processes there will be following scenario of execution:
Only 4 processes will be blocked by failure of the 2nd one.
It saves enough time necessary to react and there is a possibility to solve the 2nd process issue during execution of processes 3 and 5.
It can save a lot of resources and allows to increase SLA, shorten loading window, increase necessary reaction time etc.


Ratios of advantage of fine grain process management:
i.e. number of remaining processes to be executed after repair
Failure in process 1: 33%
Failure in process 2: 33%
Failure in process 3: 0% .. the same result
Failure in process 4: 25%
Failure in process 5: 33%
Failure in process 6: 0% .. the same result
Failure in process 7: 0% .. the same result

There are much more interesting ratios in real complex systems.
Lets list again advantages taken by fine grain process management in the area of failover parallelism:
  • reduce costs of short reaction time of operations - support, it is much more time to solve problem without additional delays
  • shorten necessary time window for loads
  • reach better SLA
And disadvantages (there are always some disadvantages;)
  • more complex metadata of process management, harder definition
  • requires system allowing fine grain process management

Ludek Bob Jankovsky
All Right Reserved © 2007, Designed by Bob Jankovsky