[clean-list] exceptions again (a bit long, but this might actually be correct)

Tue, 27 Jan 2004 22:01:51 +0100

Hi,

since a nice hole was found in my earlier proposal for handling
exceptions, the following thoughts have popped up. Let me first
recapitulate a bit:

-- Overview --

Suppose we have a function definition of the form:

f a
  = g a
  catch e = h e a

In my earlier proposal, this could loosely be interpreted as: f a = g a,
unless an exception "e" occurs in the evaluation of g a. In that case, f
a = h e a. "g a" however, does NOT have any value then. So "f" is not
"g".

Here is the hole: f (x + y) may not equal f (y + x), because the (lazy)
evaluation of (x + y) within f may throw a different exception than the
evaluation of (y + x). And for different exceptions, h a e may differ.
This is a serious problem that destroys referential transparency.

This problem however, _only_ occurs for lazily evaluated arguments (as
far as I can see). If f is strict in its argument a, exceptions can only
occur during the evaluation of g. The latter is no problem, as long as
the compiler ensures that g always raises the same exception for a
particular argument a within a single run of the program (let's assume
that this is possible, for the moment).

-- A Variation --

Back to the problematic lazily evaluated argument "a". Isn't it weird
that f encounters exceptions that arise during evaluation of its
argument "a"? The computation of "a" can involve anything, and maybe f
should _not_ be made responsible for handling any exceptions in it.

So suppose that we define that catch statements can only catch
exceptions that arise from definitions that are explicitly contained
_in_ that function definition? Either directly or indirectly. In other
words: if an exception occurs while evaluating the argument "a", it will
_never_ be caught in f, but propagated outward. Only exceptions that
originate in g will be caught by f.

-- So what? --

Well, that may be enough to solve the problem mentioned above: if (x +
y) raises an exception, then f (x + y) will _not_ have a function value
either, but it will also lead to an exception. And so will f (y + x),
regardless whether f contains a catch statement or not.

-- Mindset --

If you want to catch exceptions in arguments that you pass into another
function there are two options:
* handle the exceptions in the arguments, so they are not visible
outside the arguments.
* handle the exceptions yourself

In other words: you cannot transfer your own trouble to another.

-- Higher Order Functions --

Partial function applications will need to be treated like closures. If
one has functions "g = \x y -> x + y" and "h = \x y -> y + x" then "f g"
should not differ from "f h". The evaluation of "g" and "h" may throw
different exceptions, so f must not be able to catch them.

-- Implementation --

Basically, I'd expect that stack trimming would be needed, in
combination with a way to determine whether closures are "foreign" or
not.

One way to do this, is to introduce the possibility to have nodes
created in separate compartments in the heap (more about this below).
When one enters a function f with a catch statement, a new compartment
is made in which new nodes will be created. Functions without catch
statements remain evaluating in the "current" compartment.

When f evaluates a "foreign" closure however, i.e. one that resides in
another compartment, this closure will create new nodes in its _own_
compartment until the closure evaluation returns.

When f returns, its compartment gets combined with the one from which it
was called. 

When an exception occurs, one has to remember the compartment that it
occurred in. The exception can only be caught by a catch statement in
its own compartment or in an "parent" compartment in case the handler
lets the exception "pass".

There's one thing that needs to be relatively fast in this scheme: when
evaluating a closure, one should be able to quickly detect whether it is
part of the callers compartment, or whether it has its own.

There are ways to do this, without having to add a "compartment
identifier to each closure. The node address itself could be used to
compute an index into some compartment descriptor, assuming one divides
the heap up into chunks of say, 32K.

Yes, it will take some clock ticks to do this. Well, that's life
evaluating closures anyway.

--

If any of you find a hole in this proposal, I am sure you let me know,
and I'll try crawling through some dust.

regards,
Marco