Coping with complexity:Iteration
1.Design for iteration:easy to change
2.Document the assumptions
3.Take small steps
4.Don't rush to implementation
5.Plan for feedback - bugreports,etc
6.Study failures rather than assign blame for them.
7.Constantly be on guard to make overall design clean despite iterations,need foresight
8.Adopt sweeping simplifications (查看原文)
Although the number of potential abstractions for computer system components is unlimited, remarkably the vast majority that actually appear in practice fall into one of
three well-defined classes: the memory, the interpreter, and the communication link.
These three abstractions are so fundamental that theoreticians compare computer algorithms in terms of the number of data items they must remember, the number of steps their interpreter must execute, and the number of messages they must communicate.
To meet the many requirements of different applications, system designers build lay-
ers on this fundamental base, but in doing so they do not routinely create completely
different abstractions. Instead, they elaborate the same three abstractions, rearrang-
ing and repackaging them to create... (查看原文)
One way to limit interactions between software modules is to organize systems as
clients and services. In the client/service organization, modules interact only by send-
ing messages. his organization has three main benefits:
1.Messages are the only way for a programmer to request that a module provide
a service. Limiting interactions to messages makes it more difficult for program-
mers to violate the modularity conventions.
2.Messages are the only way for errors to propagate between modules. If clients
and services fail independently and if the client and the service check messages,
they may be able to limit the propagation of errors.
3.Messages are the only way for an attacker to penetrate a module. If clients and
services carefully check the messages before they act on them, they ca... (查看原文)
Layered design has proven to be especially effective, and it is used in some form in
virtually every network implementation. The primary idea of layers is that each layer
hides the operation of the layer below from the layer above, and instead provides its own interpretation of all the important features of the lower layer. Every module is assigned to some layer, and interconnections are restricted to go between modules in adjacent layers.
Three layers of network:
• The link layer: moving data directly from one point to another.
• The network layer: forwarding data through intermediate points to move it to the
place it is wanted.
• The end-to-end layer: everything else required to provide a comfortable
application interface.
The terms frame, packet, segment, messa... (查看原文)
When techniques for at-least-once delivery (the persistent sender) and at-most-once
delivery (duplicate detection) are combined, they produce an assurance that is called
exactly-once delivery. This assurance is the one that would probably be wanted in an
implementation of the Remote Procedure Call protocol of Chapter 4. Despite its name,
and even if the sender is prepared to be infinitely persistent, exactly-once delivery is not
a guarantee that the message will eventually be delivered. Instead, it ensures that if the
message is delivered, it will be delivered only once, and if delivery fails, the sender will
learn, by lack of acknowledgment despite repeated requests, that delivery probably failed.
However, even if no acknowledgment returns, there is a still a possibility that the message ... (查看原文)
The sliding window appears to eliminate the need to know the network round-trip
time, but this appearance is an illusion. The real challenge in flow control design is to
develop a single flow control algorithm that works well under all conditions, whether the bottleneck is the sender’s rate of generating data, the network transmission capacity, or the rate at which the receiver can accept data. When the receiver is the bottleneck, the goal is to ensure that the receiver never waits. Similarly, when the sender is the bottleneck, the goal is to ensure that the sender never waits. When the network is the bottleneck, the goal is to keep the network moving data at its maximum rate. The question is what window size will achieve these goals.
The answer, no matter where the bottleneck is located,... (查看原文)
This chapter has introduced a lot of concepts and techniques for designing and dealing with data communication networks. A natural question arises: “Is all of this stuff really needed?”
The answer, of course, is “It depends.” It obviously depends on the application,
which may not require all of the features that the various network layers provide. It also depends on several lower-layer aspects.
For example, if at the link layer the entire network consists of just a single point-to-
point link, there is no need for a network layer at all. There may still be a requirement
to multiplex the link, but multiplexing does not require any of the routing function of a
network layer because everything that goes in one end of the link is destined for whatever is attached at the other end. In addition,... (查看原文)
As the number of clients increased, the length of the queue increased accordingly.
With enough clients, the queue would grow long enough that some requests would time out before the server got to them. Those clients, upon timing out, would repeat their requests. In due course, the server would handle the original request of a client that had timed out, send a response, and that client would go away happy. But that client’s duplicate request was still in the server’s queue. The stateless NFS server had no way to tell that it had already handled the duplicate request, so when it got to the duplicate it would go ahead and handle it again, taking the same time as before, and sending an unneeded response. The client ignored this response, but the time spent by the server handling the duplicate ... (查看原文)
The second difference between ordinary procedure calls and RPCs is that RPCs
introduce a new failure mode, the “no response” failure. When there is no response
from a service, the client cannot tell which of two things went wrong: (1) some failure
occurred before the service had a chance to perform the requested action, or (2) the
service performed the action and then a failure occurred, causing just the response
to be lost. Most RPC designs handle the no-response case by choosing one of three implementation strategies:
1.At-least-once RPC. If the client stub doesn’t receive a response within some
specific time, the stub resends the request as many times as necessary until it
receives a response from the service.
This implementation may cause the service to execute a request more than onc... (查看原文)
One way to design a reliable system would be to build it entirely of components that are
individually so reliable that their chance of failure can be neglected. This technique is
known as fault avoidance. Unfortunately, it is hard to apply this technique to every com-
ponent of a large system. In addition, the sheer number of components may defeat the
strategy. If all N of the components of a system must work, the probability of any one
component failing is p, and component failures are independent of one another, then the probability that the system works is ( 1 – p ) ^N. No matter how small p may be, there is some value of N beyond which this probability becomes too small for the system to be useful.
The alternative is to apply various techniques that are known collectively by the name
... (查看原文)
In dealing with active faults, the designer of a module can provide one of several
responses:
• Do nothing. The error becomes a failure of the module, and the larger system or
subsystem of which it is a component inherits the responsibilities both of
discovering and of handling the problem. The designer of the larger subsystem
then must choose which of these responses to provide. In a system with several
layers of modules, failures may be passed up through more than one layer before
being discovered and handled. As the number of do-nothing layers increases,
containment generally becomes more and more difficult.
• Be fail-fast. The module reports at its interface that something has gone wrong.
This response also turns the problem over to the designer of the next higher-level
system, but in ... (查看原文)
Incidentally, the strategy of employing multiple design teams can also be applied to
hardware replicas, with a goal of increasing the independence of the replicas by reducing the chance of replicated design errors and systematic manufacturing defects. Much of software engineering is devoted to a different approach: devising specification and programming techniques that avoid faults in the first place and test techniques that systematically root out faults so that they can be repaired once and for all before deploying the software. This approach, sometimes called valid construction, can dramatically reduce the number of software faults in a delivered system, but because it is difficult both to completely specify and to completely test a system, some faults inevitably remain. Valid construct... (查看原文)
8.6.1 Design Strategies and Design Principles
Standing back from the maze of detail about redundancy, we can identify and abstract
three particularly effective design strategies:
• N-modular redundancy is a simple but powerful tool for masking failures and
increasing availability, and it can be used at any convenient level of granularity.
• Fail-fast modules provide a sweeping simplification of the problem of containing
errors. When containment can be described simply, reasoning about fault
tolerance becomes easier.
• Pair-and-compare allows fail-fast modules to be constructed from commercial,
off-the-shelf components.
Standing back still further, it is apparent that several general design principles are
directly applicable to fault tolerance. In the formulation of the fault-tolerance desi... (查看原文)
We now have seen examples of two forms of atomicity: all-or-nothing and before-or-
after. These two forms have a common underlying goal: to hide the internal structure of
an action. With that insight, it becomes apparent that atomicity is really a unifying
concept:
An action is atomic if there is no way for a higher layer to discover the internal structure
of its implementation.
This description is really the fundamental definition of atomicity. From it, one can
immediately draw two important consequences, corresponding to all-or-nothing atom-
icity and to before-or-after atomicity:
1. From the point of view of a procedure that invokes an atomic action, the atomic
action always appears either to complete as anticipated, or to do nothing. This
consequence is the one that makes atomic action... (查看原文)