world as value

Peter Achten peter88@cs.kun.nl
Wed, 14 Jul 1999 11:50:33 +0200


At 08:46 PM 7/13/99 +0100, Adrian Hey wrote:

>Q1: Is 'Concurrent Clean' concurrent or not?
>--------------------------------------------
>When I think about concurrency I'm thinking about concurrent IO performing
>processes. I think this is what most other people mean by concurrency also.
>But the current Clean manual documentation on concurrency annotations actually
>appears to be talking about concurrent graph reduction (what I would call
>parallelism). In particular, it makes the point that concurrency annotations
>_do_not_ introduce non-determinism.
>
>> The scheme is deterministic: given a Clean program and a particular World
>> value, one can in principle completely trace and predict the outcome of
>> the program.
>
>How so? Does this forbid IO functions which get the current time? If these
>are allowed then I presume that programs running on machines with different
>speeds might well behave differently, even if they started with the same
>'world state' (although since the world state is a thoroughly opaque black
>box, I can't think of any way we could ever be sure the two programs did
>start with the same world state). This seems like a situation which physical
>scientists would deplore. Arguments which cannot (in principle) be resolved
>by experiment are usually regarded as futile.
>
>The paper about Concurrent Haskell (which Simon Peyton Jones mentioned)
>makes the point that all concurrent languages are non-deterministic (third
>page).
>So we have 3 contradictory 'facts'..
> 1-  Clean is a concurrent programming language.
> 2-  All Clean programs are deterministic.
> 3-  Every concurrent programming language is non-deterministic.
>
>So who's right? I guess the real problem here is that I am (we are) using
>terms and concepts without properly defining them first. E.G. We're not
>distinguishing between a Clean (source) program and a compiled one which is
>actually executing and interacting with a (non-deterministic?) *World.
>
>Actually, it is possible to implement deterministic concurrent programs for
>some systems. I do a lot of DSP work where all events are strictly periodic
>and synchronous and the time taken to compute the response to an event is
>fixed. But, for systems in general, we don't know when events will occur
>or how long it will take to compute the response. So it's usually
>necessary to accept non-determinism in order to prevent all IO activity
>being blocked by lengthy computation. 

A1: 'Concurrent Clean' is concurrent.
-------------------------------------
For the sake of clarity: _Clean_ is a sequential, lazy, pure functional programming language. _Concurrent Clean_ is the extension of Clean with concurrency annotations. However, these annotations only influence the internal evaluation (or graph reduction) of a program, but not its outcome. This is also what I meant with being deterministic: the outcome of the program is fixed. 

The claim you refer to in the paper 'Concurrent Haskell' that "every concurrent language is non-deterministic" is not true in all cases. It does hold for concurrent languages that can't access independent parts of the world but do support full concurrency. The Clean World does consist of independent sub environments! These can be modified concurrently, safely, and deterministically. Maybe an example can clarify this:

doit2 :: *World -> *World
doit2 world
  # (file1,world) = openafile world
  # (file2,world) = openanotherfile world
  # file1         = dosomething file1
  # file2         = dosomethingelse file2
  # world         = fclose file1 world
  # world         = fclose file2 world
  = world

This program first opens two files in consecutive order, changes them, and the closes them in consecutive order. However, the file modifications can be evaluated concurrently: writing something in file1 can't have an effect in file2 and vice versa. So dosomething and dosomethingelse can be safely evaluated concurrently.

The Clean World is an opaque black box (to programmers) but it has structure. Good design gives concurrent access to independent components of the World. In Clean this has been realised by virtue of the uniqueness type system, but it probably could also have been done with other linear type systems, or costly run-time checks.

>> The scheme is deterministic: given a Clean program and a particular World
>> value, one can in principle completely trace and predict the outcome of
>> the program.
>
>How so? Does this forbid IO functions which get the current time? 
No, time can be modelled as an infinite sequence of increasing numbers which we happen to recognize as the 'real world time'. The point here is that _if_ we would know all input in advance (and getting 'current time' is also input), _then_ the outcome of the program is fixed. 


>Q2: Concurrency via multiple environments?
>------------------------------------------
>Consider somethimg like..
>
>doit :: *World -> *World
>doit world =
> let (world1,file1)= openafile world
>     file2         = writesomedata file1
>     world2        = dosomethingelse world1
>     world3        = closeafile file2 world2
> in  world3
>
>If my understanding of Clean semantics is correct, then we could say that
>because world1 and file1 are independent environments then writesomedata and
>dosomethingelse can be performed concurrently (perhaps with additional
>concurrency annotations).
>
>I think this could be regarded as a form of deterministic concurrency, but it
>seems to suffer from the same problem as the 'symFork' construct (mentioned
>in the Concurrent Haskell paper). The problem is that all IO activity after
>doit will be blocked (unnecessarily) until both writesomedata and
>dosomethingelse have completed.
>
>One problem I can see with this is that the collection of independent
>environments is effectively determined by the Clean run time system, and
>there is no way I can introduce new (IO) environments of my own using Clean.
>This seems a little inflexible to me. The best I could do would be to dust
>down my trusty C compiler and hack it. Correct?
>
>Another (minor) problem I can see is that this whole approach only seems
>viable if the world can be broken down into a 'tree like' data structure
>of independent environments. What if this isn't the case? I suspect a data
>structure which reflected real world environment dependencies would be
>more 'graph like', with shared environments and cycles. I'm not sure if
>this is a real problem or not, but I fear that admitting this possibility
>will make the whole problem of deciding what can and can't be done
>concurrently much more complex.


A2: Multiple environments.
--------------------------
As you can see, the first part of your second question has been dealt with in my first answer. 

>I think this could be regarded as a form of deterministic concurrency, but it
>seems to suffer from the same problem as the 'symFork' construct (mentioned
>in the Concurrent Haskell paper). The problem is that all IO activity after
>doit will be blocked (unnecessarily) until both writesomedata and
>dosomethingelse have completed.
In a sense this is true, but that happens because that's the program that you've written. If you want another behaviour, then you should specify another program. In every programming language, also concurrent ones, you can specify fully sequential programs. To avoid that you must take advantage of the specific tools that are provided to circumvent such problems. In the case of Clean this is the use of independent subenvironments. 

>One problem I can see with this is that the collection of independent
>environments is effectively determined by the Clean run time system, and
>there is no way I can introduce new (IO) environments of my own using Clean.
>This seems a little inflexible to me. The best I could do would be to dust
>down my trusty C compiler and hack it. Correct?
No, you can easily add your own environments. An elaborate example is the IOSt data structure that plays a central role in the GUI library. Although it is an abstract data type for programmers, it is a complete Clean record data structure. Ofcourse, when you are building an interface to some new OS components, you probably have to do some hacking (b.t.w: 'hacking' is synonymous to smart programming). 

>Another (minor) problem I can see is that this whole approach only seems
>viable if the world can be broken down into a 'tree like' data structure
>of independent environments. What if this isn't the case? I suspect a data
>structure which reflected real world environment dependencies would be
>more 'graph like', with shared environments and cycles. I'm not sure if
>this is a real problem or not, but I fear that admitting this possibility
>will make the whole problem of deciding what can and can't be done
>concurrently much more complex.
I don't think this a problem: the real world _is_ very much like a tree if you look at it at the right level. So one should not treat the file system of a single computer as one monolithic object, but one that can broken down in smaller units, the individual files. 

In any case, the Clean Object I/O library also allows you to structure a program as a collection of explicit interactive processes. These processes can share common resources in the same way as a Clean program shares resources with other programs. 


>Q3: What is a value?
>--------------------
>
>> The major point of the World, and any other, environment is that it
>> is a _value_. 
>
>But unlike any other value, in that it changes all by itself, not just in
>response to what the Clean program does (as Martin Wierich observed).
>Are values which do this really values? I suppose there's no law against
>calling such things values, if you like. But I could call the time on my watch
>a constant if I like. It's different every time I look at it, but I say it's
>still a constant (or at least it would be if I didn't keep looking at it). :-)
>
>But I don't think a philosophical debate about what is or isn't a value will
>get us very far. In pragmatic terms, it is my humble opinion that the world
>(or environment) value is simply a trick to fool the Clean (or Haskell) graph
>reducer to invoke side effect causing operations in some specified order.
>
>I have no problem with this. In a lazy functional language we need some way
>to fool the graph reducer into performing operations in the order we want them
>performed, and world as value is as good as any other way I've seen so far.
>(I count IO monads and continuation passing as 'world as value' IO also).
>
>My problem with this is: What if we don't care in what order the operations
>are performed? We still have to pick an order and pass the environment
>value between operations, thereby making the operations strictly sequential.
>For an environment like a file, this is probably exactly what we want, but
>is this true for environments in general?

A3: Values are static.
----------------------
I think we get to a crucial point here. When I talk about environments (such as the World, files and the file system, event stream) then what I mean are values that represent the state of the real world (sub) environment that they represent at some point. This is a true value in the common sense of functional programming: it is a constant. Modification of an environment is simply the application of a function which creates a _new_ value, because modification is a state transition. This is normal functional behaviour. 

The real world seems to change by itself. This is modeled in the Clean World in the way as I explained in my previous posting. The only way (as far as I can see now) to model such a behaviour is by introducing processes. As I argued, a process can be modeled by a value: namely its state (a data structure) and its transition (a function, which in a functional language is also a value). 

Environments are not a trick, but a simple result of sticking to functional programming. You have to be explicit about the arguments and results. This gives you a clear view of the World for free!

>My problem with this is: What if we don't care in what order the operations
>are performed? We still have to pick an order and pass the environment
>value between operations, thereby making the operations strictly sequential.
If you don't care you shouldn't be bothered to specify an evaluation order. This is exactly what we try to do by introducing sub environments. 


>Q4: What does a model prove?
>----------------------------
>> Now it's not difficult to explain what happens when a functional Clean
>> program tries to open a file: the fopen function gets the current World
>> value as argument and first evaluates an 'arbitrary' number of other
>> programs. This models context switching. Note that 'arbitrary' can indeed
>> be easily modeled by an infinite sequence of random numbers, also to be
>> placed in the World. Then the file is being opened. This either succeeds or
>> fails, depending on the fact whether any other program already closed or
>> opened that particular file. The (dummy) file and result boolean are passed
>> as result of fopen, as well as the new World value. [In his e-mail d.d.
>> 6-15-1999, Fergus Henderson mentions a similar scheme, as well as its 
>> antiquity]
>
>There's that word again. I can understand that a (deterministic) model
>(a.k.a. simulation?) of a non-deterministic system might well be a useful
>way of answering some questions about that systems behaviour, if the
>apparent non-determinism arises from (pseudo) random numbers.
>
>But what does the existance of such a model prove? If it's supposed to prove
>that the system being modeled really is deterministic (and can therefore be
>implemented as a purely functional program), I don't find this argument very
>persuasive :-(
>
>I suppose part of the problem is I'm unclear about what 'level' we're talking
>about when we say a system is deterministic or not. It could be.....
>
>1- A high level 'non-deterministic' concurrent programming language.
>2- A logic gate level 'deterministic' finite state machine.
>3- A 'non-deterministic' quantum mechanical electron and semiconductor
>   physics level.
>
>Certainly, a deterministic computer program could accurately model option 2,
>and could legitimately be regarded as an implementation of the
>'non-deterministic' option 1, but in reality, who would want to go to
>such lengths? It would be easier to write the program in assembler.
>
>If concurrency (and implied non-determinism) is required then people are
>much more likely to chose a concurrent (not purely functional) programming
>language, I think.


A4: Functional programming is complete.
---------------------------------------
On 6-19-1999 you wrote:
>The reason I ask is that I don't much like the idea of using external 'helper'
>applications, operating systems etc which couldn't be written in my chosen
>language. As a pragmatic solution I can accept such things, but ideally I
>would like to be able to describe my entire system in the same language using
>consistent concepts.

Now you ask:
>But what does the existance of such a model prove? If it's supposed to prove
>that the system being modeled really is deterministic (and can therefore be
>implemented as a purely functional program), I don't find this argument very
>persuasive :-(

This is what we are trying to achieve. The existence of such a model proves that you can do I/O and handle processes and have concurrency in a pure functional setting. You don't have to give up any of the advantages of functional programming. I don't mean to say that we have 'the ultimate answer' and also many other people are working on this, but we want to know how far one can go in this scheme. 

I don't know how many programs exist 'out there' that require non-determinism on a fundamental level, but I do know that many programs exist that can be suitably programmed within a functional framework.


Cheers, Peter