Availability Requirements
- Availability Requirement (a.k.a., Operational Availability Requirement)
- any dependability requirement
that specifies a minimum required amount of the
quality factor
availability
The typical objectives of availability requirements are
to:
- Ensure that something is available for use when it is needed.
- Maximize the amount of time that it is available for use.
- Minimize both the scheduled and unscheduled down-time
during which it is not available for use.
- Enable users to start using it at arbitrary user-chosen times.
- Enable its users to continue using it once they start.
Availability requirements are typically specified in terms
of the following measurements:
- The minimum acceptable percentage of the time that
something is available.
- The minimum acceptable probability that something is
available:
- At any given time.
- During any given time interval.
The following are typical examples of availability
requirements:
- “Credit card authorizations shall have an
availability of 99.99%.”
- “Account maintenance functionality shall have a
minimum availability of 99% over every 3 hour period and a
minimum of 99.9% over every 48 hour period.”
- “Admistrative reporting shall have an availability
of 99%.”
- “All accounting functionality (i.e., all accountant
use cases) shall have a minimum availability of 98% between
the hours of 7:30AM EST and 7:30PM PST Monday through
Friday.”
- “User access to persistant data shall have an
availability of 99.95%.”
- “At least one data center shall be available at all
times at least 99.999% of the time.”
- “The university website shall not have more than 5
hours of scheduled downtime per month and not more than an
average of 1 hour of unscheduled downtime per
month.”
The following guidelines have been found to be useful when
producing availability requirements:
- The scope of an availability requirement can be:
- Availability requirements can be identified and specified
in term of the following:
| Component of
Requirement |
Possibile Values |
| Type of Downtime |
Scheduled downtime
Unscheduled downtime |
| Scope of Downtime |
Total
Localized to a functional area
Localized to an external actor
Localized to a user interface
Localized to an API |
| State when Downtime Starts |
Normal operations
Degraded mode |
| Response Types |
Notify selected users and other systems
Log unscheduled downtime
Shut down external server systems
Switch to degraded mode
Shutdown affected capabilities
Totally shut down |
| Response Measurements |
Time percent when available
Time intervals when available
Time durations when available |
- Availability requirements should be specified
quantitatively.
- Although highly related, availability and reliability are
not the same. An application can be highly available even
though it is not very reliable if its mean time to fix (MTTF)
is very small. Conversely, an application can be quite
unavailable even though it is highly reliable if its MTTF is
very large.
- Because some capabilities of an application are more
critical than others, availability requirements should
usually be restricted to its most time-critical capabilities
(e.g., functional areas, use cases or use case paths, and
interfaces). For example, the availability of customer
purchasing and credit card approvals should typically be
higher than that of allowing a customer to change their
password. Unfortunately in practice, the most time-critical
capabilities are often those that are most complex and thus
the most likely to contain defects that can lower their
availability.
- Availability is typically specified as either a number of
nines (e.g., 3 nines = 99.9% and 5 nines = 99.999%) or as
continuous availability (i.e., absolutely no downtime is
allowed). The following table clarifies how rapidly downtime
decreases as availability increases:
Percent
Operationally
Available |
Number
Of
Nines |
Total
Downtime
Per Year |
| 90% |
1 |
36 Days 12 Hours |
| 95% |
N/A |
18 Days 6 Hours |
| 98% |
N/A |
7 Days 7 Hours |
| 99% |
2 |
3 Days 15 Hours |
| 99.9% |
3 |
8 Hours 46 Minutes |
| 99.99% |
4 |
52 Minutes 31 Seconds |
| 99.999% |
5 |
5 Minutes 15 Seconds |
| 99.9999% |
6 |
31.5 Seconds |
- Availability can be specified:
- On a minimum or average basis.
- With beginning and ending time periods.
- Over specific time periods (e.g., several hours,
several days, a year).
- Difficulty and expense increases rapidly as availability
increases.
- Availability is lessened because of both:
- Scheduled downtime (e.g., due to preventative and
perfective maintenance)
- Unscheduled downtime (e.g., corrective maintenance
including failures + time to find, fix, and test the
associated defects).
- Availability is
directly related to several other quality
factors because high availability requires high:
- Correctability such as very low mean time to fix
(MTTF).
- Correctness.
- Maintainability such as very low scheduled downtime
(e.g., due to corrective, perfective, and preventative
maintenance).
- Reliability.
- Robustness.
- Security.
- Availability is
inversely related to other quality factors
because high availability means low:
- Availability requirements are different from and not to
be confused with the
architectural mechanisms that may be used to
implement them:
- Redundancy which eliminates single points of failure
and enables failover:
- Data component redundancy (e.g., multiple
synchronized databases, regulary data backup).
- Hardware component redundancy (e.g., dual networks,
dual servers, dual network connectivity devices such as
routers, uninterruptable power supplies, load balancing,
hot or cold component swapping, local
data-center-internal redundancy vs. distant multiple data
center redundancy).
- Hardware subcomponent redundancy (e.g., multiple
CPUs, hard drives, disk drives, motherboards, and network
cards).
- Software component redundancy (e.g., using multiple
implementations by different development teams using
different designs and implementation languages, voting
with majority overruling minority solutions, software
application redundancy via installation disk backup,
etc.).
- Data center redundancies (e.g., back-up data
centers).
- Software failure monitoring and failover
components.
- Regular hardware maintenance.
- Availability of hardware replacement components.
- Purchase of components certified to have high
reliability and availability by their vendor organizations
or independent testing laboratories.
- Regular software upgrades (e.g., OS system upgrades,
bug patches, current anti-virus definitions, etc.)
- Exception handling (e.g., based on assertions).
- Extensive testing and quality engineering.
- Proper disaster recovery.
- Dedicated technical staff 24/7.