Kademlia


Kademlia is a distributed hash table for decentralized peer-to-peer computer networks designed by Petar Maymounkov and David Mazières in 2002. It specifies the structure of the network and the exchange of information through node lookups. Kademlia nodes communicate among themselves using UDP. A virtual or overlay network is formed by the participant nodes. Each node is identified by a number or node ID. The node ID serves not only as identification, but the Kademlia algorithm uses the node ID to locate values.
In order to look up the value associated with a given key, the algorithm explores the network in several steps. Each step will find nodes that are closer to the key until the contacted node returns the value or no more closer nodes are found. This is very efficient: like many other s, Kademlia contacts only Big O notation| nodes during the search out of a total of nodes in the system.
Further advantages are found particularly in the decentralized structure, which increases the resistance against a denial-of-service attack. Even if a whole set of nodes is flooded, this will have limited effect on network availability, since the network will recover itself by knitting the network around these "holes".
I2P's implementation of Kademlia is modified to mitigate Kademlia's vulnerabilities, such as Sybil attacks.

System details

Peer-to-peer networks are made of nodes, by design. The protocols that these nodes use to communicate, and locate information, have become more efficient over time. The first generation peer-to-peer file sharing networks, such as Napster, relied on a central database to co-ordinate lookups on the network. Second generation peer-to-peer networks, such as Gnutella, used flooding to locate files, searching every node on the network. Third generation peer-to-peer networks, such as Bittorrent, use distributed hash tables to look up files in the network. Distributed hash tables store resource locations throughout the network.
Kademlia uses a "distance" calculation between two nodes. This distance is computed as the exclusive or of the two node IDs, taking the result as an unsigned integer number. Keys and node IDs have the same format and length, so distance can be calculated among them in exactly the same way. The node ID is typically a large random number that is chosen with the goal of being unique for a particular node. It can and does happen that geographically far nodes – from Germany and Australia, for instance – can be "neighbors" if they have chosen similar random node IDs.
XOR was chosen because it acts as a distance function between all the node IDs. Specifically:
  • the distance between a node and itself is zero
  • it is symmetric: the "distances" calculated from A to B and from B to A are the same
  • it follows the triangle inequality: given A, B and C are vertices of a triangle, then the distance from A to B is shorter than the sum of both the distance from A to C and the distance from C to B.
These three conditions are enough to ensure that XOR captures all of the essential, important features of a "real" distance function, while being cheap and simple to calculate.
Each Kademlia search iteration comes one bit closer to the target. A basic Kademlia search algorithm has complexity of , that means for network with nodes it will take at most steps to find that node.

Fixed-size routing tables

Fixed-size routing tables were presented in the pre-proceedings version of the original paper and are used in the later version only for some mathematical proofs. An actual Kademlia implementation does not have a fixed-size routing table, but a dynamically sized one.
Kademlia routing tables consist of a list for each bit of the node ID Every entry in a list holds the necessary data to locate another node. The data in each list entry is typically the IP address, port, and node ID of another node. Every list corresponds to a specific distance from the node. Nodes that can go in the nth list must have a differing nth bit from the node's ID; the first n-1 bits of the candidate ID must match those of the node's ID. This means that it is very easy to populate the first list as 1/2 of the nodes in the network are far away candidates. The next list can use only 1/4 of the nodes in the network, etc.
With an ID of 128 bits, every node in the network will classify other nodes in one of 128 different distances, one specific distance per bit.
As nodes are encountered on the network, they are added to the lists. This includes store and retrieval operations and even helping other nodes to find a key. Every node encountered will be considered for inclusion in the lists. Therefore, the knowledge that a node has of the network is very dynamic. This keeps the network constantly updated and adds resilience to failures or attacks.
In the Kademlia literature, the lists are referred to as k-buckets. k is a system wide number, like 20. Every k-bucket is a list having up to k entries inside; i.e. for a network with k=20, each node will have lists containing up to 20 nodes for a particular bit.
Since the possible nodes for each k-bucket decreases quickly, the lower bit k-buckets will fully map all nodes in that section of the network. Since the quantity of possible IDs is much larger than any node population can ever be, some of the k-buckets corresponding to very short distances will remain empty.
Consider the simple network to the right. The network size is 2^3 or eight maximum keys and nodes. There are seven nodes participating; the small circles at the bottom. The node under consideration is node six in black. There are three k-buckets for each node in this network. Nodes zero, one and two are candidates for the furthest k-bucket. Node three is not participating in the network. In the middle k-bucket, nodes four and five are placed. Finally, the third k-bucket can only contain node seven. Each of the three k-buckets are enclosed in a gray circle. If the size of the k-bucket was two, then the furthest 2-bucket can only contain two of the three nodes. For example, if node six has node one and two in the furthest 2-bucket, it would have to request a node ID lookup to these nodes to find the location of node zero. Each node knows its neighbourhood well and has contact with a few nodes far away which can help locate other nodes far away.
It is known that nodes which have been connected for a long time in a network will probably remain connected for a long time in the future. Due to this statistical distribution, Kademlia selects long connected nodes to remain stored in the k-buckets. This increases the number of known valid nodes at some time in the future and provides for a more stable network.
When a k-bucket is full and a new node is discovered for that k-bucket, the least recently seen node in the k-bucket is PINGed. If the node is found to be still alive, the new node is placed in a secondary list, a replacement cache. The replacement cache is used only if a node in the k-bucket stops responding. In other words: new nodes are used only when older nodes disappear.

Protocol messages

Kademlia has four messages.
  • PING — Used to verify that a node is still alive.
  • STORE — Stores a pair in one node.
  • FIND_NODE — The recipient of the request will return the k nodes in its own buckets that are the closest ones to the requested key.
  • FIND_VALUE — Same as FIND_NODE, but if the recipient of the request has the requested key in its store, it will return the corresponding value.
Each RPC message includes a random value from the initiator. This ensures that when the response is received it corresponds to the request previously sent.

Locating nodes

Node lookups can proceed asynchronously. The quantity of simultaneous lookups is denoted by α and is typically three. A node initiates a FIND_NODE request by querying to the α nodes in its own k-buckets that are the closest ones to the desired key. When these recipient nodes receive the request, they will look in their k-buckets and return the k closest nodes to the desired key that they know. The requester will update a results list with the results it receives, keeping the k best ones that respond to queries. Then the requester will select these k best results and issue the request to them, and iterate this process again and again. Since every node has a better knowledge of its own surroundings than any other node has, the received results will be other nodes that are every time closer and closer to the searched key. The iterations continue until no nodes are returned that are closer than the best previous results. When the iterations stop, the best k nodes in the results list are the ones in the whole network that are the closest to the desired key.
The node information can be augmented with round trip times, or RTT. This information will be used to choose a time-out specific for every consulted node. When a query times out, another query can be initiated, never surpassing α queries at the same time.

Locating resources

Information is located by mapping it to a key. A hash is typically used for the map. The storer nodes will have information due to a previous STORE message. Locating a value follows the same procedure as locating the closest nodes to a key, except the search terminates when a node has the requested value in its store and returns this value.
The values are stored at several nodes to allow for nodes to come and go and still have the value available in some node. Periodically, a node that stores a value will explore the network to find the k nodes which are near the key value and replicate the value onto them. This compensates for disappeared nodes.
Also, for popular values that might have many requests, the load in the storer nodes is diminished by having a retriever store this value in some node near, but outside of, the k closest ones. This new storing is called a cache. In this way the value is stored further and further away from the key, depending on the quantity of requests. This allows popular searches to find a storer more quickly. Since the value is returned from nodes further away from the key, this alleviates possible "hot spots". Caching nodes will drop the value after a certain time depending on their distance from the key.
Some implementations have neither replication nor caching. The purpose of this is to remove old information quickly from the system. The node that is providing the file will periodically refresh the information onto the network. When all of the nodes having the file go offline, nobody will be refreshing its values and the information will eventually disappear from the network.