Reference Guide C API www.spreadconcepts.com


Programmer's Reference Guide
Replicated Hash Table
Version 2.0


Hash Table Operations

Create Comparison & Hash Functions Notification Callback Functions UNION vs INTERSECTION Cleanup Error_Handling Size Put Get Delete Snapshot

Create
rht_create
creates a local replica of a specified RHT and returns its per-process unique replica handle through r. All subsequent operations on this replica must be made using this integer handle. The unique name assigned to the the replicant in the replication system is copied to rpn if it is non-NULL. The parameter o allows the user to specify information about the RHT to which they are connecting and is declared as:

typedef int  (*rht_key_cmp_fcn)(rht_blob  k1,
                                  rht_blob  k2, 
                                  const void *kcntxt);

typedef long (*rht_key_hcode_fcn)(rht_blob  k, 
                                    const void *kcntxt);

typedef void (*rht_on_update_cb)(rht                    r, 
                                   const struct rht_pair *new_entry, 
                                   const struct rht_pair *old_entry);

typedef void (*rht_on_synch_diff_cb)(rht                    r, 
                                       const struct rht_nset *synced,
                                       rht_synch_type         t, 
                                       const struct rht_nset *diff,
                                       const struct rht_nset *synching);

typedef void (*rht_on_failure_cb)(rht      r, 
                                    rht_code err);

typedef void (*rht_on_trash_cb)(rht       r, 
                                  rht_code  err, 
                                  const void *key_cntxt);

typedef struct {
  const char *           daemon_name;
  rht_name             login_name;
  rht_name             rp_name;

  rht_key_cmp_fcn      key_cmp;
  rht_key_hcode_fcn    key_hcode;
  const void *           key_cntxt;

  rht_on_update_cb     on_update_cb;
  rht_on_synch_diff_cb on_synch_diff_cb;
  rht_on_failure_cb    on_failure_cb;
  rht_on_trash_cb      on_trash_cb;

  int                    flags;

} rht_options;

The member daemon_name must be a valid pointer to a null terminated C string. Furthermore, daemon_name must be a valid Spread daemon name that represents the daemon this replica should use for communication. Optimally, the Spread daemon used should reside on the same machine as the replica.

login_name must be a valid Spread login name, which will be used to generate the unique name of the replicant in the replication system. rp_name must be a valid Spread group name and represents the name of the logical replicated hash table (rht) to replicate locally.

Comparison and Hash Functions
The members key_cmp, key_hcode and key_cntxt specify how keys in the RHT should be compared with one another on the local machine. You must ensure that these comparison parameters are compatible with all the other threads that might subscribe to the RHT rp_name. Effectively, all threads subscribing to a given RHT must use equivalent comparison parameters and functions (except potentially for machine-dependent endian corrections, see below) or the RHT will exhibit undefined behavior. If key_cmp is NULL, then the default key comparison relation uses memcmp. If the two keys are of equal length, then memcmp is directly applied to the key buffers. If the two keys are of unequal length, then the shorter key is considered lesser than the longer key. If key_hcode is NULL then the default key hash code computation uses all n bytes of the key using the common heuristic:

long hcode = 0;

while (n-- != 0) {
  hcode = hcode * 33 + *key_ptr++;
}

return hcode;

If one or both of the default key comparison functions is not appropriate for the keys of a particular rht, then you must provide the necessary key comparison functions. The member key_cntxt will be passed to your key comparison functions for this replica. If you do not need such a context, then set key_cntxt to be NULL. There are several important issues to consider when constructing key comparison functions:

  • Your key_cmp relation must return zero if two keys are considered equivalent and non-zero if they are not.
  • Your key_cmp relation must be invariant, reflexive, symmetric and transitive.
  • You must ensure that key_cntxt remains valid for your key comparison functions during the lifetime (described below -- see on_trash_cb) of this replica. An easy way to accomplish this is to have any such context be allocated and initialized on program startup and never deallocated (e.g. - static allocation).
  • You must account for any endian differences amongst the replicant host machines. For example, if your key comparison functions consider endian dependent information (such as an int or a float) and your RHT network involves different endianness architectures, then you must always insert your keys in a consistent form (e.g. - network endian form) and your comparison functions will need to convert that information to a locally appropriate representation before comparing.

Additionally, for ordered dictionaries (coming soon):

  • Your key_cmp function must return an integer less than, equal to, or greater than zero if the first key is considered, respectively, to be less than, to be equivalent to or to be greater than the second key. For unordered dictionaries:
  • Your key_hcode mapping must be an invariant mapping from keys to the integer range [-2^31, 2^31). Any predictable structure in the mapping will be removed by the dictionary (i.e. - it is OK if your mapping exhibits strong structure; for example such as mapping all keys of a certain type only to even integers).
  • For best performance, a random key should have approximately 1 / 2^32 probability of mapping to any given value.
  • Your key_cmp relation and key_hcode mapping must be equality compatible: [key_cmp(key1, key2) == 0 => key_hcode(key1) == key_hcode(key2)]. Note that the reverse implication need not be true.

Notification Callback Functions
The members on_update_cb, on_synch_diff_cb, on_failure_cb and on_trash_cb are optional callback fcns that the user can register to be notified of different events. If you do not want one or more of these callbacks, then set the respective members to NULL. It is safe to call any rht_* function on a replica during a callback. However, any read operations have to be dirty reads (described below), because you are blocking any updates from being applied while in a callback. During callbacks the executing thread has a lock (see rht_grab) on the replica upon entry into the callback function. This lock should never be dropped during the callback. If it is, undefined behavior will occur.

on_update_cb is called every time an update is made to the hash table. new_entry is the new entry in the table and old_entry is the previous entry in the table at the same key. Either of these parameters can be NULL, which represents the presence of no entry. For example, if new_entry was non-NULL and old_entry was NULL, then this would be the case where a new entry is being inserted into the table where previously no matching key existed. Similarly, if new_entry was NULL, then this would mean a pre-existing key was being deleted. Note, that upon entry into the callback the hash table has already been structurally changed to reflect the update.

on_synch_diff_cb is called every time the level of synchronization of this replica with respect to other replicas changes. synced contains the names of the replicas in the replication system with which this replica is currently completely synchronized (including itself). synching contains the names of the other replicas in the replication system with which this replica is still attempting to synchronize (synching is disjoint from synced). The parameter t indicates the type of synchronization change causing this callback and the meaning of the contents of diff. t can be one of RHT_SYNCH_SUBTRACTIVE, RHT_SYNCH_NEUTRAL and RHT_SYNCH_ADDITIVE. These values indicate, respectively, a loss of communication with formerly synchronized replicas, a change solely in the set of actively synchronizing replicas (synching) and the completion of synchronization with formerly unsynchronized replicas. RHT_SYNCH_SUBTRACTIVE indicates that diff contains the names of the formerly synchronized replicas with which communication was lost. RHT_SYNCH_NEUTRAL indicates that diff is empty -- if you want to track changes in the synching set, you will need to track them yourself. RHT_SYNCH_ADDITIVE indicates that diff contains the names of the newly synchronized replicas. Note, that the synching set can change for all three types. t and diff are really only there to distinguish between types of changes with respect to completely synchronized replicas. Note that these name sets are only valid while in the current callback and references to them must not be kept.

on_failure_cb is called every time a permanent failure occurs that will not allow the replica to continue making progress (e.g. - out of memory). After such a callback, most of the fcns on that RHT will fail and rht_fini should eventually be called on it.

on_trash_cb is called when a replica is completely garbage collected. The time between a successful rht_create call and the on_trash_cb for that replica is the lifetime of the replica. The main purpose of this call is to give the user a chance to reclaim resources from their key_cntxt if necessary.

Union vs Intersection
The member flags is a bit field for specifying boolean, creation level options on the replica. Currently, this member can only be one of two values RHT_UNION (0 -- the default) or RHT_INTERSECT. This flag specifies how deletes should be handled in the face of network partitions. With union semantics, when an entry is deleted the entry is immediately and completely removed from the hash table. If this deletion occurred while the network was partitioned, then when the network heals, replicas on the other side of the partition may still have the old entry for that key, in which case the delete will be "overwritten" with the old value that was deleted on this side of the partition. Intersect semantics combats exactly this eventuality. With intersect semantics, when an entry is deleted a special "deleted entry" is left in its place. Therefore, if the above scenario occurred, the delete entry would "overwrite" the entry that existed on the other side of the partition and the entry would remain deleted.

rht_fini is used to "shut down" a replica, reclaim some of its resources and mark its handle as invalid when you are done with it. rht_fini forces the replica r to disconnect from Spread (if it has not already due to permanent errors) and causes any threads in RHT functions on this replica to return quickly -- some of these threads may return a RHT_INVALID_HANLDE error. At about the time the caller enters rht_fini, all subsequent RHT calls on this replica handle will fail with a RHT_INVALID_HANDLE error. You may call rht_fini on a replicant handle at any time and as many times as you wish! Only the first such call will have any effect. This can reduce the thread synchronization headache caused when multiple threads share a single replica: any/all of these threads can call rht_fini at any time and the other threads will "safely" receive RHT_INVALID_HANDLE errors, indicating that this replica is no longer valid. To reclaim ALL of the resources associated with a given replica you need to call rht_fini on the replicant handle AND rht_snap_fini on any existing valid snapshots taken of the replica. Once these conditions are met and all threads have left RHT procedures on the replica, its on_trash_cb will fire. Note, that key-value pair entries from a replica will continue to persist until you call rht_pair_fini on any rht_pair you still have from that replica. As you perform each of these operations the amount of resources consumed is reduced.

rht_error returns any permanent error that exists on a replica through err_ptr. A permanent error is an error from which the replica cannot recover. Most RHT calls (except for rht_fini) on the handle of such a replica will fail with the same permanent error code until rht_fini is called.

rht_size returns the number of key-value pairs contained in a RHT through sz. Note that this size can change in the next instant if the hash table is not locked.

rht_put inserts the key-value pair (k -> v) into the RHT r. If an equivalent key already exists in the rht, then the corresponding key-value pair will be overwritten. Note that if multiple updates to the same entry are performed by different threads around the same time, then the one with the highest LTS (Lamport Time Stamp) will eventually "win," and some of those updates may never be applied to the hash table at all as they will be seen as "out of date" by the replicas upon receipt.

rht_get performs a lookup of k on the RHT r. If an equivalent key exists in the rht, then a reference (a rht_pair pointer) to the corresponding key-value pair will be returned through p. This reference allows you to inspect the key, value and LTS of a key-value pair entry. Such a reference to a constant key-value pair will remain valid, and therefore in memory, until rht_pair_fini is called on it. Note that the data a key-value pair entry (rht_pair) references will never change, even if the entry is changed in the hash table. A rht_pair references constant data. If no matching entry is found, success is returned but p will be set to NULL.

rht_del deletes k from the RHT r. If an equivalent key exists in the rht, then the corresponding key-value pair will be removed. Success (the return code) does not depend on whether or not an equivalent key already exists in the rht. Remember that for RHT_UNION semantics no entry is left behind marking the key as deleted, whereas for RHT_INTERSECT semantics one is. Also, the same note above for rht_put applies to rht_del. If the delete of an entry is initiated around the same time as other updates to that entry, which update will eventually "win" and whether all those updates will be applied or not is determined by the LTSs (Lamport Time Stamp) of the updates.

rht_snapshot initializes snap to contain a snapshot of the current state of the RHT r. The replica is temporarily locked and references to all the current entries in the replica are inserted into a constant, local hash table (rht_snap). For more information on snapshots, see snapshot reference.



Copyright © 2004, Spread Concepts LLC, All rights reserved