an
architectural
mechanism in which an
application or
software component gracefully terminates execution and
immediately restarts it in a known, clean, internal state.
Applications and software components often suffer from
random and transient failures due to latent defects that are
just too costly to find and fix (a.k.a., Heisenbugs named after
Werner Heisenberg’s uncertainty principle). The
associated failures often show up only after a significant
amount of execution time, and they are typically due to such
defects as memory leakage, unreleased file lockage, and data
corruption. They often prevent an application or component from
achieving its
operational availability requirements and
reliability requirements. A way is needed to allow the
defects to remain in the application and software components,
while simultaneously preventing the associated failures from
occurring.
The typical objectives of software rejuvination are to:
- Anticipate and avoid failures.
- Increase operational availability and reliability by
minimizing failure due to latent defects that are too
expensive to find and fix.
- Minimize testing costs.
Software rejuvination can typically be described as
follows:
- Software is designed, programmed, and tested to
gracefully terminate an application and immediately restarts
it in a known, clean, internal state.
- Termination and restart may be global or on a process by
process basis.
The typical stakeholders of the software rejuvination
mechanism are:
The software rejuvination mechanism is typically developed
during the following phases:
The software rejuvination mechanism can typically be started
if the following preconditions hold:
The typical inputs to the software rejuvination mechanism
include:
- Work Products:
- Stakeholders:
Software rejuvination is typically subject to the following
limitations:
- If high operational availability is required, then
software rejuvination requires redundant hardware that can
continue execution execution while individual processes can
be stopped and restarted.
- Software rejuvination increases execution overhead.
- Software rejuvination does not eliminate the underlying
defects; it merely allows the application or component to
avoid the associated failures.
- Software rejuvination is a form of automated preventative
self maintenance.
- It is important to identify the optimal interval between
successive applications of software rejuvination so as to
balance the risk of failure due to the cost of performing the
preventative maintenance.