[clean-list] Exceptions handling again

Fri, 12 Dec 2003 01:03:43 -0000

It's often useful to look into what other FPLs do, for reference.
Erlang, for example, puts a premium not only on fault-avoidance,
but also on fault-tolerance: something will go wrong eventually
if you make the software/hardware combination complex and 
long-running enough (such as the backbone of the UK phone 
system). _When_ it does, it's no good to shut down everything, 
sit in the corner with red lights blinking and flashing the message 
"this should never have happened"..

Erlang does support exception handling, but a favourite line is
actually that no process should do its own exception handling:
instead, there'll be processes doing the processing, and other
processes (preferably on separate machines) that do the supervising.
If a supervised process encounters a situation it wasn't programmed
to deal with, it is _not_ supposed to try and make up something
sensible, but it is supposed to crash (that takes some getting used
to, but remember that there are supposed to be supervisors to
deal with such local crashes;-). Only that process will crash, not 
the whole system, and its supervisor process will be notified and 
can then decide whether to log the problem and "try again" or 
what else to do.

Instead of trying to repeat the arguments here, I'll just refer to
Joe Armstrong's recent PhD thesis, in which he tries to explain
much of his Erlang philosophy, and how it informed the design
of Erlang and its libraries (the latter encapsulate most of the
fault-tolerance, error-handling, and even concurrency issues,
so that Erlang application programmers scarcely need to program
these things explicitly - they simply instantiate generic design
patterns with application-specific code):

    making reliable systems in the presence of software errors
    http://www.sics.se/~joe/index.html

Sounds like what you were looking for, doesn't it?-) Now if 
only someone would give us the best aspects of Clean, Haskell, 
and Erlang, combined in a single, elegant language design..

Cheers,
Claus

> Marco Kesseler wrote to me on 12 march 2003 (translation from Dutch):
> "Say processor A assigns evaluation of an expression to processor B and
> then the communication line between the two dies:
> - what keeps processor A from deciding after all to evaluate the expression
> itself, or assign it to processor C?
> - what keeps processor B from halting evaluation of this expression and
> just throwing it away? B could even use a timeout interval (the way
> webservers handle sessions)."
.. 
> I have seen too many examples of software that gives up too early, mostly
> because it was too expensive to program for the alternatives. I am
> perfectly aware that in the end there will be a moment when all conceivable
> alternatives have been tried in vain. Then the program is allowed to quit.
> But not earlier. Even if a program is waiting for an Oracle database that
> died and needs operator attention (don't ask me why, but it happens all the
> time where I work) the program should not quit. Instead I can imagine an
> OS-message to the operator, who restarts the database, after which my
> program just resumes its evaluation. Same with communication lines. Even
> with database hardware failures: the operator could backup on a different
> machine, rerun the logged transactions and then resume my program.
.. 
> My conclusion: yes we may need classical exceptions handling for a few
> years to come, but that should not be the end of it. A lot more automatic
> recovery support could be developed so that an applications programmer
> would scarcely need to explicitly program exceptions handling.