Database Clustering
Database clustering is a
design
refactoring
technique in which performance
is improved by physically grouping (i.e., clustering)
persistent data that are commonly retrieved simultaneously.
Data that is logically related (e.g., via reference or
aggregation) is often retrieved and used at the same time. If
these data are physically grouped together (e.g., in the same
segments or pages of memory), the number of I/O operations can
be significantly reduced.
The typical objectives of database clustering are to:
Database clustering can typically begin when the following
preconditions hold:
- A performance problem exists that is due to the access of
related persistent data.
- This data is logically related (e.g., by reference or
aggregation).
- These logical groupings of persistent data are
not physically grouped together.
Database clustering is typically complete if the following
postconditions hold:
- These logical groupings of persistent data are physically
grouped together.
- The performance problem(s) no longer exist.
When using the database clustering technique, members of the
database
team typically perform the following steps:
- Develop a common clustering strategy:
- Store one object per segment if the objects are large,
complex, expensive to transfer, and which are typically
individually accessed.
- Store object together with all of their component
objects when they tend to be accessed together.
- Store all instances of a data type together if queries
tend to involve searching all of the objects of the
type.
- Cluster all objects having the same value of a specific
attribute together if those objects having an attribute
satisfying a user selection criteria are often accessed
together.
- Identify performance bottlenecks due to a lack of proper
database clustering.
- Determine relative importance of throughput, response
time, and timeliness performance.
- Determine the cost and feasibility of recalculating the
data as opposed to fetching it.
- Determine if the data should be replicated or
recalculated.
- Refactor the design to include local copies of replicated
persistent data.
Database clustering typically results in the following work
products:
Database clustering is typically subject to the following
limitations:
- Database clustering is typically inappropriate when:
- Performance throughput and response time requirements
are not stringent.
- Logical groupings of persistent data are physically
grouped together.
- It is hard for programming languages and database
management systems to differentiate [relatively] static links
(e.g., aggregation relationships) from volatile links,
especially when they are implemented by pointers as opposed
to references.
- Object serialization mechanisms often store all reachable
subobjects instead of just the true subobjects.
Unfortunately, this may result in the redundant storage of
shared subobjects and resulting consistency problems.
- Although a design refactoring technique, database
clustering often occurs during implementation and testing
when performance bottlenecks or failures become obvious. Do
not waste significant time trying to perform database
clustering prior to implementation; instead concentrate then
on situations where it is obvious that ddatabase clustering
will help.
- Where possible, use database management systems that
automatically support database clustering.