Data Replication
Data replication is a
design
refactoring
technique in which persistent
data is copied locally (i.e., replicated) in order to increase
performance.
Sometimes, persistent data is used significantly more often
than it is updated under conditions, whereby the time required
to “unnecessarily” reobtaining the data across a
network is prohibitive and may result in the failure of
performance (e.g., response time and throughput requirements).
However, the locally replicated data may easily become
inconsistent with the original data stored in a database when
the original data is updated or deleted. Also, if the same data
is replicated in multiple locations, it must be updated in all
of these locations in a timely manner, and this can cause
performance problems with other tasks. If updates to persistent
data occur only infrequently, then their timing may possibly be
scheduled for periods of low transaction volumes. Data
replication can reduce network communication volume and even
costs when users are charged for network transactions. Data
replication is much easier to implement when a database
management system (DBMS) directly supports data replication and
ensures the consistency of replicated data.
The typical objectives of data replication are to:
- Increase performance.
- Remove performance bottlenecks.
- Meet performance requirements.
Data replication can typically begin when the following
preconditions hold:
- A performance problem exists that is due to the fetching
of persistent data across a network.
- This data is accessed much much more frequently than it
is modified.
-
Throughput and
response time requirements are more important than
timeliness requirements.
- Data recalculation (as opposed to data replication) is
either:
- Impossible (e.g., data is not calculable) or
- Inappropriate (e.g., performance prohibitive due to
difficult and lengthy calculation).
Data replication is typically complete if the following
postconditions hold:
- Persistent data is replicated (i.e., copied locally) in
one or more locations in the system architecture.
- The performance problem(s) no longer exist.
When using the data replication technique, members of the
database
team typically perform the following steps:
- Identify performance bottlenecks due to fetching
persistent data across a network (e.g., by manual design
evaluation, during execution with a profiler or performance
tuner).
- Determine access and update frequency.
- Determine relative importance of throughput, response
time, and timeliness performance.
- Determine the cost and feasibility of recalculating the
data as opposed to fetching it.
- Determine if the data should be replicated or
recalculated.
- Refactor the design to include local copies of replicated
persistent data.
Data replication typically results in the following work
products:
Data replication is typically subject to the following
limitations:
- Data replication is typically inappropriate when:
- Stringent timeliness requirements exist.
- Performance requirements are not stringent.
- All software is hosted locally so that there is no
network traffic.
- Although a design refactoring technique, data replication
often occurs during implementation and testing when
performance bottlenecks or failures become obvious. Do not
waste significant time trying to perform data replication
prior to implementation; instead concentrate then on
situations where it is obvious that data replication will
help.
- Where possible, use database management systems that
automatically support data replication.