Taxonomy/1. Quality/09. Fault Tolerance

Uit Werkplaats
Ga naar: navigatie, zoeken
RationalitySquare.gif

Taxonomy
of Computer Science
Hanno Wupper
Hans Meijer
Angelika Mader
Stijn Hoppenbrouwers
Mieke Boon


 © comments


Related:

How can an artefact suddenly "go wrong"?

As explained in Formal Methods, Correctness Theorems, and Verification and validation we can be sure that an an →artefact does what it is supposed to do if the right parts are assembled in the right way conforming to a blueprint that satisfies the right specification.

Cause of faults
  • wrong specification
  • incorrect blueprint
  • incorrectly assembled artefact
  • faulty parts

By contraposition one may find out why an artefact might be faulty, i.e. not have a certain intended properties. This gives rise to a taxonomy of failures.

  • It may be that the specification doesn’t state it, in which case one should repair the specification (by ‘declaring the bug a feature’ or otherwise).
  • Assuming that the specification precisely states the desired properties, we call the artefact 'faulty' (i.e. not implementing its specification).
    • Then either the blueprint must be incorrectly designed (and not proven to satisfy the specification anyway), in which case one should develop a correct blueprint,
    • or the artefact cannot be a correct realisation of the blueprint, in which case
      • it must be assembled in the wrong way,
      • or else at least one of the parts itself is faulty, in which case
        • it either can be replaced by a better part
        • or not.
Faulty parts
  • hardware faults (aging and wearing)
  • unexpected interaction
  • accidents
Only the last alternative is deserves the term "falt tolerance": We have the right specification, a blueprint that satisfies it, and we can realise an artefact conforming to that blueprint - only some parts refuse to implement their specifications, and we cannot find parts that do, because all availabe parts are inherently faulty.

The reason can lie in the part itself, in the structure of the artefact, or in its environment. In the first case we must distinguish between hard- and software.

tentative below this line

Hardware faults

Even in the absence of construction faults hardware will never be perfect. Think of electric bulbs with a limited lifetime, pneumatic tyres that will get flat sooner or later due normal wearing, or hard disks that will crash in the most unsuitable moment.

These failures often occur with a statistic distribution and may depend on the lifetime of the piece of hardware. If they cannot be neglected, there is only one solution: if a piece of hardware cannot be an implementation of its specification and you cannot change it, change the specification!

... Reliability Theory ...

language confusion

Software "crashes", even if they seem to occur occur statistically, are inherently different from hardwere failure.

A "blue screen" from DOS or a "freezing" Windows, however, do not belong into this interesting catgegory. It is a tempting to treat a notoriously freezing piece of ill-designed software like the Wolfram wire in a vacuum bulb, which even in the absence of contruction errors is inherently subject to thinning. But ...

Unexpected interaction inside the artefact

... wrong abstraction: abstracted from something that leads to interference between system components ...

Unexpected influendces from the environment