Introduction
A hash table is a common data structure interface that maps "keys"
to "values." A hash table usually stores its keys and
values in such a way that given a key the associated key-value pair(s)
in the hash table can be found (or not) very quickly. Depending
on the implementation, a hash table can often be accessed by many
threads in a single process or across multiple processes on the
local machine.
The RHT library takes this notion of thread cooperation one giant
step further and allows many threads connected through a network
to operate on one logical dictionary, a rht, in parallel.
RHT accomplishes this by replicating the key-value pairs of a RHT
into a local hash table copy (replicant) at each participating thread
(multiple threads within a process can share a replica).
Queries on a RHT are performed locally, while updates to a RHT
are "sent to the network" to be ordered in relation to
updates from other threads before being applied. Under normal conditions,
updates are performed in the same order at all of the replicas of
a rht. This means that every thread "sees" the exact same
key-value pairs in a given RHT as each update is applied -- this
is called virtual synchronization. RHT is not the only library offering
this kind of state replication service.
So how is it different from competing solutions? One major difference
lies in the fault tolerance of the solutions. Most replication
products have single points of failure. A common approach utilizes
some form of a single master, multiple slave design where the master
pushes updates out to the slaves. In this kind of system, if the
master crashes or cannot be reached, then updates to the state cannot
be made. The architecture of RHT, on the other hand, is a truly
robust, n-way distributed state replication engine. In RHT, each
of the replicas is a master: any set of the participating threads
can crash/recover, the network connectivity can fail, heal and/or
change between the participating threads in any manner and queries
and updates can continue to be made at every replica!
The architecture of RHT also allows for major performance gains:
queries are handled locally, so the throughput of queries a logical
RHT can handle increases linearly with the number of participating
processors! Threads that cannot communicate with one another obviously
cannot be guaranteed to be synchronized. So how does RHT handle
communication break downs? Simply put, threads that maintain stable
communication with each other for "long enough" periods
of time will perceive the same synchronized rht. Threads that cannot
stably communicate with each other will continue operating on their
own, different replicas of the RHT in question. When stable communication
is restored between such threads, they automatically resolve differences
between their replicas and then perceive the same virtually synchronized
RHT.
RHT resolves inconsistencies of such replicas by associating Lamport
Time Stamps (LTS) with each entry in the table. When two replicas
have different entries for equivalent keys, then the entry with
the later (higher) LTS replaces the "older" entry. Use
of LTSs also ensures that causality is maintained amongst cooperating
threads. RHT uses innovative algorithms developed by Spread Concepts,
LLC (http://www.spreadconcepts.com)
and the Spread Toolkit (http://www.spread.org)
to efficiently achieve virtual synchronization of RHT replicas across
the network.
Before using the RHT library, you must set up a Spread network
to allow replicas to communicate. For RHT to perform optimally,
you should run a Spread daemon on each of the machines where threads
might use RHT. The configuration and usage documentation of Spread
is available from http://www.spread.org/docs/docspread.html.
|