[clean-list] Joe Armstrong's thesis

fzuurbie@inter.nl.net fzuurbie@inter.nl.net
Tue, 23 Dec 2003 12:11:13 UT


This is a multi-part message in MIME format.

--_----------=_1072181473238090
Content-Disposition: inline
Content-Length: 5157
Content-Transfer-Encoding: binary
Content-Type: text/plain

Dear all,

On 12 December Claus Reinke pointed me to Joe Armstrong's thesis about Erlang: Making reliable systems in the presence of software errors (http://www.sics.se/~joe/index.html). I promised not to talk about exception handling until I read it. So I read it.

Erlang's starting points are quite alien to me. Armstrong says Erlang is strict, but he does not say why. John Hughes in his famous paper "Why Functional Programming Matters" points out the contribution of lazy evaluation to program correctness.

On page 87 Armstrong writes: "The reason for this [EZ: concentrate concurrency handling in a few modules] is that concurrent code cannot be written in a side-effect free manner, and as such, is more difficult to understand and analyse than purely sequential side-effcect free code." This may be true for Erlang, but had he known about Marco Kesseler's implementation of Clean on Transputers, he would have known this does not hold in general, but only for applications that have inherent non-deterministic aspects.

Erlang is dynamically typed and features side-effects. Armstrong does not reflect on these design decisions, in particular on their effect on program correctness. He does not give the impression he is even aware of static typing and other compile type program analysis methods to further correctness.

These observations lead me to believe that everything has been done in the design and implementation of the Erlang language and compiler to make sure that programming errors will be made, if only to prove that systems will function nonetheless. His programming guide lines also reflect this. Page 126: "Rule: 1 - The program should be isomorphic to the specification. The program should faithfully follow the specification. If the specification says something silly then the program should do something silly. The program must faithfully reproduce any errors in the specification." Armstrong does not even consider the option to go back to the customer and question the specifications.

I confess there are cases where I have not the faintest idea what a compiler error message means. Particularly Clean's uniqueness coercion messages can give me a hard time. In such cases I may dream of an executable that does run time checks and shows me what goes wrong. But for production quality software, Erlang's and Clean's road to reliability are very different.

One point is interesting, although it does not sound completely new: the way to handle exceptions such as heap and stack overflows. Armstrong describes supervisors: processors that supervise other processors. As processors are isolated, a supervisor can generally continue operations when a supervised processor suffered a heap or stack overflow.

Explicit message passing as the only way to parallellism, Erlang's view on exception handling and (other) side-effecting operations completely destroy referential transparency, thereby denying the programmer another road to program correctness.

All in all, I think reading the thesis was a waste of time. Apart from one thing: Armstrong's work reassures me in thinking that exception handling may be implemented in a way that retains referential transparency / purity – on some level. In the implementations I know of, after an exception occurred, a programmer is completely free to specify what the program should do next. I think that liberty should be dropped. TCP/IP does that: a time-out exception (an acknowledgement message for a packet was not received in time) is followed by a resend ad infinitum. This is completely outside the programmer's control. [I am aware that programmers also have the means to break this cycle, but that is not what I am talking about here.] Anyway, I believe that in case of an exception, the program should not 'step aside' (do just anything) but 'freeze' (until the user loses his patience) or better still 'go forward' (try other ways to achieve the same effect).

I hinted at other ways to achieve 'the same effect' in an earlier posting: In case of an integer overflow, redo the calculation with unlimited precision, if a connection times out, try a different connection, if a database machine does not respond, post an OS-message to the operator and resume activities when the problem is fixed, when retrying does not seem to solve a problem, the operating system could even halt operations, have the bug fixed and resume after that. I am well aware that in this latter case, the semantics of the program may change, so you can hardly say that the program will have 'the same effect'. This shows that this approach should start out as a sincere intention. In a later stage that could be backed up by static analysis. So initially, you could prove a program to be only conditionally correct: IF the semantics of an exception handler is in a way equivalent to the semantics of the non-exceptional case.

This all should have the effect that when an application does not respond, the user will have confidence that the operating system is aware of this and that all feasible options are being tried to achieve the desired result.

Regards Erik Zuurbier

--_----------=_1072181473238090--