Lecture 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12? | 13? | Index


To add a question because something is unclear or was not understood, just insert the question and add the prefix %q% for each addition (q like question). This is the "question-style". Like this:

* %q% What kind of problems could have decentralized nature?
  • What kind of problems could have decentralized nature?

If you want to answer a question or add a comment please put a %a% in front. This is thea "answer-style" (a lilke answer). An example:

* %a% This is an addition to something that I consider important.
  • This is an addition to something that I consider important.

For citations or references to the slides of Prof. Suri pleas add the lecture and slide number in braces: (<lecture>.<slide>).

Please make sure that you enter an author name, else your changes will not be saved!


Lecture 1 - Introduction to Dependability and Distributed Systems

Definitions

  • Dependability (1.5)
  • Fault tolerance/Reliability (1.5, 1.10)
  • Distributed system (1.5) (see lecture 2)
    • "A distributed system is the one preventing you from working because of the failure of a machine that you never heard of." (Leslie Lamport) (2.3)
    • "A distributed sytem is a collection of independent computers that appers to its users as a single coherent system." (Andrew Tanenbaum) (2.4)
  • Availability (1.11, 1.14)
  • Reliability (1.13)
  • Redundancy (1.16)
    • Physical/spatial redundancy: add resources (HW/SW)
      => Duplex, TMR (1.17)
    • Temporal redundancy: redo task (1.17)
    • Combinations of the above

Fault, Error, Failure

  • Cause and effect relationships
    • Fault ==> Error ==> Failure (1.9)
  • Fault models
    • a fault model only makes sense if errors can be detected (1.24)
    • Byzantine faults (1.26)
      • a node delivers different values to different nodes, but values are in correct data range.
      • = a node can lie
  • Fault nature (1.24)
    • Data faults (e. g. out of range)
    • timing faults (e. g. to late)
  • Fault duration (1.24)
    • permanent
    • transient*?
    • intermittent*? = occurs now and then, not necessarily periodic
  • Failure semantics, fault fault-severity*? (1.24, 1.26, 1.27) (see lecture 6)
    • fail-stop
    • fail-omission*?
    • fail-safe
    • fail-silent

DS coordination (1.31)

  • Asynchronous
    • Two phase commit protocols (2PC) (1.31ff)
      • time-lag/delay - not suitable for e.g. control applications
      • Depends on reliable communication
      • Possibility of livelock or deadlock
  • Synchronous
    • How to achieve FT synchronization?

Nach oben

Lecture 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12? | 13? | Index


Recent Changes


Nach oben

Zuletzt geändert am 15 März 2005 14:12 Uhr von chrschn