Reliability Requirements
A
reliability requirement is a
dependability
requirement that specifies a required amount of reliability,
which is a
quality factor that is defined as follows:
- Reliability
- the degree to which something operates without
failure under given conditions during a given time
period.
The typical objectives of a reliability requirement are
to:
- Ensure that something will function properly for long
periods without failure.
- Thereby minimize any unintentional disruptions in
operation (i.e., unscheduled downtime).
Reliability requirements are typically specified in terms of
the following measurements:
- The mean time between failures (MTBF), whereby MTBF is
defined as the average period of time that the application
shall continue to function correctly without failure under
stated conditions.
- The maximum acceptable probability of the failure during
a given time period.
- The maximum permitted number of failures per unit
time.
The following are typical examples of reliability
requirements:
- “The application’s mean time between failures
shall be at least 1 month.”
- “The mean time between failures for the normal
paths through the purchase item use case shall be at least 4
months.”
- “The component’s mean time between failures
shall be at least 1 year.”
- “The probability that the component fails shall not
exceed .001% per year.”
- “The component shall not fail more than an average
of 3 times per year.”
The following guidelines have been found to be useful when
producing reliability requirements:
- The scope of a reliability requirement can be:
- Reliability requirements can be identified and specified
in term of the following:
| Component of
Requirement |
Possibile Values |
| Type of Failure |
Failure to handle an input
Failure to produce an output
Failure to produce a correct output
Total failure |
| Scope of Failure |
Total
Localized to a functional area
Localized to an external actor
Localized to a user interface
Localized to an API |
| State when Failure Occurs |
Normal operations
Degraded mode |
| Response Types |
Notify selected users and other systems
Log unscheduled downtime
Shut down external server systems
Switch to degraded mode
Shutdown affected capabilities
Totally shut down |
| Response Measurements |
Mean time between failures (MTBF)
Mean time to fix (MTTF)
Maximum acceptable probability of the failure
during a given time period. Maximum acceptable number of
failures per unit time. |
- Reliablity requirements should be specified
quantitatively.
- Because some capabilities of an application are more
critical than others, reliability requirements should usually
be restricted to its most important capabilities (e.g., use
cases or use case paths). For example, the reliability of
customer purchasing and credit card approvals should
typically be higher than that of allowing a customer to
change their password. Unfortunately in practice, the most
time-critical capabilities are often those that are most
complex and thus the most likely to contain defects that can
lower their reliability.
- Reliability requirements should specify the conditions
under which they apply:
- The application or component capabilities, the
reliability of which is being specified.
- The application or component load (because relability
may decrease as the number of simultaneous transactions or
the size of the databases increase).
- Reliability requirements are difficult to validate,
especially if they require high reliability.
- One can use statistics to estimate the reliability of
an application based on the reliability of its components
(as well as certain assumptions about the independence of
the failures).
- One can estimate the reliability of an application or
component based on ‘long term’ reliability
testing prior to delivery, but the duration of the testing
is often so limited by delivery schedules to allow very
accurate estimation.
- Reliability is related to availability as follows:
- High availability implies either high reliability or
very low mean time to fix (MTTF).
- Low availability may mean low reliability, high MTTF,
or large amounts of scheduled downtime (e.g., for hardware
or software upgrades).