Special

Clearance Sale!

We've been publishing for over five years now and it's time to clear out our inventory of back issues, so we're slashing prices!

RBD Magazines

Check out this amazing clearance sale of all our past issues. Missing some issues? This is a great time to complete your RBD collection. Save up to 40% off the regular price of our printed back issue packages. These prices are only good until the end of the year May 2008 and supplies are limited, so place your order today.

Recent issues

Article Preview


Buy Now

PDF:

Algorithms

The Two Generals' Problem

The Fog Of Networking

Issue: 6.5 (July/August 2008)
Author: Charles Yeomans
Article Description: No description available.
Article Length (in bytes): 7,433
Starting Page Number: 40
RBD Number: 6514
Resource File(s): None
Related Link(s): None
Known Limitations: None

Excerpt of article text...

The Two Generals' Problem is a classic example of the problems encountered when trying to communicate over an unreliable channel. If you write networking software, you will typically use unreliable channels, so you should understand the problem, and how to deal with it. Suppose two generals want to coordinate an attack. The attack will fail unless both armies attack at the same time. The problem is how to agree on a time, given that they can only communicate by messenger. So General A sends a messenger to General B. When General B receives the message, he sends a message back to General A confirming the attack time. But the only way General B knows that his response was received is for General A to send a reply by messenger... The unreliability of this message channel is that the only way the sender knows that his message was received is to wait for a return message. And this unreliability cannot be removed by sending more messages. A 1975 paper entitled "Some Constraints and Tradeoffs in the Design of Network Communications" is generally cited as the first published statement of the problem. The authors present it as a problem of communication between groups of gangsters to illustrate the difficulty of achieving simultaneous agreement in a distributed system. But, difficult though it may be, it is a problem that we need to solve if we want to buy stuff over the internet. So let us change the problem to one we can solve, in the hope that the solution will be good enough. We have, say, a client and a server. The client needs to upload some chunk of data to the server, who will then do something with it. The client must know that the server received the data, and so it will keep trying to upload it until it receives an acknowledgment of receipt from the server. The server, on the other hand, must not process the same upload twice. Let's start with an algorithm in the manner of the two generals' problem. It will not work, but because we understand how it does not work, we can use it to find a solution that does work. The algorithm is this: client uploads the data to the server; server holds the data, and sends back an acknowledgment. The client receives the acknowledgment, and so knows that the server has the upload. It can then mark the data as sent, and send a message to the server to proceed with processing. Because the network is unreliable, this scheme is a restatement of the two generals' problem. From the client's perspective, there are two types of unreliability: either the upload fails, or it succeeded, but no acknowledgment is received. In either case, the client stores the upload data some place suitably persistent, and keeps trying to send it until it receives the acknowledgment from the server. Immediate resending is often a bad idea for various reasons, including, in my experience, subsequent disk-full errors caused by gigantic log files generated by millions of futile resends. Instead, some sort of truncated backoff algorithm is the way to do it. Essentially, the client waits an increasing amount of time after aborted uploads, and listens for an increasing period of time for acknowledgements, before trying again. As noted, one possible failure is that the upload succeeds, but the acknowledgment is never received. Since the client will attempt the upload again, we now need to prevent the server from processing the upload twice. For this, brute force will not suffice; we need to be more clever. A simple way to solve the duplicates problem is to assign to each upload a unique key. When the server receives the upload, it checks to see whether it has already handled an upload with this key. If so, it can send an acknowledgment of receipt just as if it were receiving the upload for the first time. This makes the upload operation idempotent; that is, the effect of performing the operation twice is the same as performing it once. But now we have a new problem: how to generate the key. A simple, unworkable approach would be to simply use the upload data itself as a key. This assumes that every upload is unique, an unlikely possibility in reality. A more efficient, but still unworkable idea is to use a hash function to generate a key from the upload. Although this suffers from the same assumption, comparisons would be faster. A simple, almost workable approach is to ask the server for a key. The server can guarantee uniqueness, at the cost of an extra message exchange. Presumably unique key generation is cheap, so it does not matter much if the server sends a key that the client never receives. But server-side key generation becomes a bottleneck if you need to move to multiple servers to add capacity or reliability. Attempting to get two servers to generate unique keys brings us right back to the two generals' problem. Introduction of a third key server introduces a single point of failure, and more unreliability. Universally unique identifiers, or UUIDs, solve the key generation problem with high probability. A UUID is a 128-bit string generated by an algorithm that makes it unlikely that the same two UUIDs will ever be generated. There are several UUID formats, but they reduce in kind to two: algorithms that take as input some local information like a MAC address and a timestamp, and randomly-generated UUIDs. When I have a choice, I always prefer the randomized choice, because random schemes can only be attacked by brute force. So a so-called version 4 UUID is a 128-bit string, six bits of which are used to identify its version, with the rest randomly generated. Assuming a decent source of randomness, the probability of generating the same UUID twice is quite low. For example, suppose you independently generate 100 UUIDs per second for one year. The probability of generating the same UUID twice is approximately 0. The use of UUIDs as keys means that the servers do not need to coordinate the distribution of keys. And, in fact, we can do even better. The probability of UUID collisions is so low that we do not need to depend on the servers to guarantee (with high probability) uniqueness; we can move the responsibility for key generation to the clients. Let us summarize our scheme for guaranteeing upload integrity. For each upload, the client generates a UUID to be used as the identifier for the upload. It then attempts to send the upload to the server until it receives an acknowledgment of receipt. When the server receives an upload, it checks to see whether it has already received an upload with the attached identifier. If not, it processes the upload. In either case, it sends an acknowledgment of receipt. The problem that we have actually solved is rather different from the two generals' problem. First, the server does not care whether the messages it sends are received by the client; only the client requires acknowledgment of its messages. There is no guarantee that the client will ever receive an acknowledgment. In practice, there is some reasonable chance that the client will receive an acknowledgment to a message, so we can reduce the probability of a total failure by resending until it is time to give up. And it is possible that a client will receive a false acknowledgement because it sends a message with a UUID that has already been used. But this is less likely than, say, a simultaneous failure of all data storage on the server side of the communication.

...End of Excerpt. Please purchase the magazine to read the full article.

Article copyrighted by REALbasic Developer magazine. All rights reserved.


 


|

 


Weblog Commenting and Trackback by HaloScan.com