Operational Availability
- Operational Availability (a.k.a., Availability)
- the soundness quality factor
representing the proportion of the time that a business
enterprise,
center,
application, or
component
behaves properly
As illustrated in the preceding figure, Operational Availability is part of the following inheritance hierarchy:
- Type: Abstract
- Superclass: Soundness
- Subclasses:
- System Operational Availability
- Application Operational Availability
- Hardware Operational Availability
- Software Operational Availability
The typical responsibilities of Operational Availability are to:
- Model readiness (i.e., availability) for performing useful work.
- Support the analysis and specification of
availability requirements.
- Provide a foundation for evaluating the quality of an architecture.
Operational availability is typically decomposed into the following aggregation hierarchy of subfactors:
Operational availability is typically measured in terms of the:
- The mean percent of the time that one or more functions/features/use cases/use case paths
operate without scheduled and/or unscheduled downtime under specified normal conditions.
- Mean-Time-To-Failure / (Mean-Time-To-Failure + Mean-Time-To-Repair)
- 1 - (Mean-Loss-Time / Mission-Duration)
Typical mechanisms for implementing operational availability include:
- Redundancy which eliminates single points of failure and enables failover:
- Data component redundancy (e.g., multiple synchronized
databases, regulary data backup).
- Hardware component redundancy (e.g., dual networks,
dual servers, dual network connectivity devices such as
routers, uninterruptable power supplies, load balancing,
hot or cold component swapping, local data-center-internal
redundancy vs. distant multiple data center redundancy).
- Hardware subcomponent redundancy (e.g., multiple CPUs,
hard drives, disk drives, motherboards, and network cards).
- Software component redundancy (e.g., using multiple
implementations by different development teams using
different designs and implementation languages, voting with
majority overruling minority solutions, software
application redundancy via installation disk backup, etc.).
- Data center redundancies (e.g., back-up data centers).
- Software failure monitoring and failover components.
- Regular hardware maintenance.
- Availability of hardware replacement components.
- Purchase of components certified to have high reliability
and operational availability by their vendor organizations or
independent testing laboratories.
- Regular software upgrades (e.g., OS system upgrades, bug
patches, current anti-virus definitions, etc.)
- Exception handling (e.g., based on assertions).
- Extensive testing and quality engineering.
- Proper disaster recovery.
- Dedicated technical staff 24/7.
The following guidelines have been found to be useful when
producing operational availability requirements:
- Because some capabilities of an application are more
critical than others, operational availability requirements
should usually be restricted to its most time-critical
capabilities (e.g., use cases or use case paths). For
example, the operational availability of customer purchasing
and credit card approvals should typically be higher than
that of allowing a customer to change their password.
Unfortunately in practice, the most time-critical
capabilities are often those that are most complex and thus
the most likely to contain defects that can lower their
operational availability.
- Consider providing beginning and ending time periods for inhouse functionality.
- Operational availability is typically documented as
either a number of nines (e.g., 3 nines = 99.9% and 5 nines =
99.999%) or as continuous availability (i.e., absolutely no
downtime is allowed). The following table clarifies how
rapidly downtime decreases as operational availability increases:
Percent
Operationally
Available |
Number
Of
Nines |
Total
Downtime
Per Year |
| 90% |
1 |
36 Days 12 Hours |
| 95% |
N/A |
18 Days 6 Hours |
| 98% |
N/A |
7 Days 7 Hours |
| 99% |
2 |
3 Days 15 Hours |
| 99.9% |
3 |
8 Hours 46 Minutes |
| 99.99% |
4 |
52 Minutes 31 Seconds |
| 99.999% |
5 |
5 Minutes 15 Seconds |
| 99.9999% |
6 |
31.5 Seconds |
- Operational availability can be specified:
- On a minimum or average basis.
- With beginning and ending time periods.
- Over specific time periods (e.g., several hours, several days, a year).
- Difficulty and expense increases rapidly as operational availability increases.
- Operational availability is lessened because of both:
- Scheduled downtime (e.g., due to preventative and perfective maintenance)
- Unscheduled downtime (e.g., corrective maintenance
including failures + time to find, fix, and test the associated defects).
- Operational availability is
directly related to several other quality
factors because high operational availability requires high:
- Correctability such as very low mean time to fix (MTTF).
- Correctness.
- Maintainability such as very low scheduled downtime
(e.g., due to upgrades and preventative maintenance).
- Reliability.
- Robustness.
- Security.
- Operational availability is
inversely related to other quality factors.
For example, high operational availability means low performance.