<< Exercise 5 | Index | Exercise 7? >>

To add a question because something is unclear or was not understood, just insert the question and add the prefix %q% for each addition (q like question). This is the "question-style". Like this:

* %q% What kind of problems could have decentralized nature?
  • What kind of problems could have decentralized nature?

If you want to answer a question or add a comment please put a %a% in front. This is thea "answer-style" (a lilke answer). An example:

* %a% This is an addition to something that I consider important.
  • This is an addition to something that I consider important.

For citations or references to the slides of Prof. Suri pleas add the lecture and slide number in braces: (<lecture>.<slide>).

Please make sure that you enter an author name, else your changes will not be saved!

Exercise 6


  1. Try to formulate a definition of dependability of a general computer-based system. What is the dependability of a car? Of an Operating System? Of a space shuttle?
  2. Revisited: Explain the difference between Faults, Errors and Failures.
  3. What is the difference between a fault model and a failure model? Name, and explain a few failure models that can be useful in a distributed system.
  4. What does a designer mean if he/she says that faults (of a certain type) are assumed to be independent? What does this mean for the design at hand?
  5. What is a SPF?
  6. Dependability is achieved using one or more of these general techniques. Explain each of them using examples:
    1. Fault removal
    2. Fault forecasting
    3. Fault prevention
    4. Fault avoidance
  7. What is a definition of a fault tolerant system? In what way does it differ from a dependable system?
  8. There are many attributes of dependability. Explain the difference (and relation) between the following attributes:
    1. Reliability
    2. Maintainability
    3. Safety
    4. Performability
    5. Security
  9. Faults can be classified as being either omissive or assertive. Give examples of faults in a modern PC belonging to the different groups.
  10. What are arbitrary faults? Byzantine
  11. For dependable systems one often hears the word "coverage". What does it mean that a system has 100% coverage? Can 100% coverage even be achieved?
  12. Revisited: Fault-Tolerance is achieved through redundancy. What are the types of redundancy available? Give examples of designs where one or the other is suitable.
  13. Most of the time you want to achieve exact agreement on a result. But sometimes only inexact agreement is possible. Give some examples of applications where this is the case.
  14. Explain the principle behind the following convergence functions: Fault-Tolerant Midpoint and Fault-Tolerant Average. When are they appropriate, when not?
  15. What is the difference between forward and backwards recovery?
  16. Distributed systems are usually considered suitable for implementing highly dependable systems. Why do you think this is so? Which properties are of importance when designing a highly dependable distributed system?


  1. Why do we only talk about error detection and failure detection and not about fault detection?
  2. Give some examples of local failure/error detectors.
  3. Local failure detectors don't work as well on the system level. Why is this the case? Tip: What are the assumptions usually made for system level failure detection in distributed systems?
  4. Someone tells you that his/her system needs 2f + 1 nodes to correctly detect f failures. Which is the failure model for the system do you think? What is the model if 3f + 1 nodes are required?
  5. What is a perfect failure detector? Why don't we always use them?
  6. Explain the concepts of weak/strong accuracy/completeness.
  7. What is an eventually weak failure detector?

Consensus and Group Membership

  1. What is network partitioning and how does one detect it? What do you do to "mend" a partitioned network such progress can be made
  2. What is the consensus problem? How would you formulate the fault tolerant consensus problem?
  3. Give an algorithm that solves the fault tolerant consensus problem when the coordinator in a system can fail (silent).
  4. What is the difference between uniform consensus and non-uniform consensus?
  5. What is a group membership service?
  6. Define a group membership algorithm using distributed agreement.
  7. What happens with the group view when one of the members fail?
  8. What does it mean that a membership service implements linear membership?

Fault-Tolerant Communication

  1. Which failures do you have to consider in a distributed setting as opposed to a centralized?
  2. Explain how reliable delivery can be implemented if links can fail by not delivering messages.
  3. What makes a asynchronous system more difficult to design than a synchronous with respect to reliable delivery? (tip: both can fail by not sending messages)
  4. How do you achieve reliable multicast as opposed to point-to-point sending of messages?
  5. How do you tolerate value faults in messages? Find out how this is done in TCP/IP over Ethernet.
  6. Revisit: define the Byzantine agreement problem.
  7. Give an example of an algorithm that solves the Byzantine agreement problem.
  8. Why do you need 3f +1 nodes to achieve agreement with Byzantine errors? Why is not a ma jority enough?

Implementing Ordering

  1. What is a causal hole and how do you avoid it when implementing a causal ordering protocol?
  2. Give an example of how you could implement a total ordering protocol.

Replication Management

  1. Explain how a state machine can help in implementing replication schemes that avoid partitioning by design.
  2. What do you do if parts of the applications are non-deterministic, say have user input?
  3. Explain the difference, using examples, of the following replication approaches. What are the advantages/disadvantages of the approaches?
    1. Active replication
    2. Semi-Active replication
    3. Passive replication

Nach oben

Recent Changes

Nach oben

Zuletzt geändert am 04 März 2005 12:34 Uhr von chrschn