world as value

P.R. Serrarens pascalrs@cs.kun.nl
Mon, 19 Jul 1999 10:49:47 +0200


Adrian Hey wrote:
 
> I am no expert on data base design, but I think there is a (partial) solution
> to your problem, with Clean as it stands today. If it was me, I would design
> my system so that only 1 process controlled the file itself, and other
> process accessed and/or modified the database contents using some suitably
> defined messages/protocol. This process could then ensure that if one
> process had requested control of a record that no other process could access
> it until control had been surrendered.
 
This is in principle possible, but gives a rather high overhead, which is in
the case of arrays often undesireable. Moreover, one needs either the Object
I/O system or Concurrent Clean to implement that I think.

> At the moment, you couldn't implement this idea in quite this way
> because Clean doesn't support independent processes communicating
> non-deterministicly via a messaging system (I think this is what Pascal
> us working on). But what you could do is implement a 'file system within a
> file system' using Clean as it is at present (without any C hacking, because
> no new hardware devices or OS routines are involved). By this I mean that
> you could treat a record as a 'mini file' and define a record type to
> represent it (which need not necessarily be abstract or unique).
 
Yes, that is possible, but it is a rather specialised solution. Suposse one
also wants to update a fiels of a record, and have references to the other
field for reading, then we need another layer. Maybe you want to update a
collection of records?

> > (BTW, I have read about
> > the approach in Peter Achten's PhD where he treats the file system as a
> > server.)
> 
> Oh, that seems to be the way I would do it. I don't think it would be possible
> to do it any other way. What your asking for would need mutable (non-unique)
> variables and some kind of locking mechanism. I don't think the Clean
> designers would be to keen on this (because we lose referential transparency).
 
Note that Peter's file system is not implemented, but it is a functional
description of the system to show that the file-system is purely functional.
The real file functions access the OS directly: this is much more efficient.

In some cases one does not need locking. If we now that it is impossible to
have two accesses at the same time, because of dependencies. For example, the Object
I/O library is a sequential program, so it will not access the filesystem more
than once at the same time. Locking is only needed in concurrent systems, and
semaphores are acceptable there. 

> I'm not sure what the limits of uniqueness and Concurrent Clean (rather than
> ordinary Clean) are, but I don't think it should be a problem. I haven't
> tried it, but just so long as separate threads get different parts of a unique
> structure I can't see any problem, because that's the only form of
> Concurrency Clean allows.

When one only uses the process annotations.

> If other forms of concurrency (like independant
> processes sharing 'unique' values) are introduced then I think we have a
> problem, but I don't think that's going to be legal even when Pascal's
> system is included (because any such values won't be unique, obviously).

Unicity does not say much here, it only enforeces single-threaded access. For
example, I could have exactly two references to the same datastructure by
making these two references unique.

Multiple processes (I call them threads) sharing the same value and treating
them as being unique is possible in Concurrent Clean extended with message
passing. Take for example the file system: one can view it as being a thread
managing file access and serving requests through message passing channels.
This is a different version of the example you had above. And yes, this system
is non-deterministic! But wasn't it also non-deterministic when we had a
collection of (not necessarily Clean) programs and the operating system? You
see: there is not much difference between programs and threads, keep that in mind.

> So if Pascal's system only allows inter-process communication via messaging,
> were back to the server approach to managing unique mutable data structures.

The trouble is that I have to care about concurrency: suppose we have multiple
'safe' references to the same datastructure: what happens when a new reducer
(using a process annotation) is created on a different processor wanting to
update part of the datastructure in parallel? One has to copy part of the
datastructure. This might be possible, but can get rather messy. I have to
think about that.

> Regarding arrays, I have often thought that it would be useful to split
> the array data type into 2 components, a 'physical' 1D array (where data is
> actually stored, and a 'logical' array, which is just a formula which
> converts (multi-dimensional) logical indexes to (1D) physical indexes.
> You could then perform a lot of useful operations, like taking slices,
> transposing, splitting into even and odd halves, just by changing the
> formula. This seems so obvious I suppose it's already been done. I
> think something like this could be useful for the problem you mentioned,
> and it seems to fit quite nicely with uniqueness typing, provided
> each operation produced logical sub-arrays which did not overlap in the
> physical array.

This is a really great idea, I had that three years ago and wrote half a paper
about it. An array was basically a function with type :: Int -> a, which may
be partial (ordinary arrays) or total (unbounded arrays (!)). The biggest
problem is sharing. When the function is shared by two functions, then we end
up calculating the elements twice. This problem can only be solved by
memoisation: remembering the results of functions you already computed, but I
haven't come across a cost effective and clean way of doing this.

Another drawback of this approach is that is very inefficient: all
intermediate functions are kept the memory until the element is needed, while
computing every element is costly, becasue we have to walk the whole tree of
function for every function, instead of doing it one for the complete result.

A good effiency can be achieved by doing these rewrites at compile-time. That
is the approach I took. All operations on these arrays were macro's and where
therefore rewritten at compiler time. I remember that it did not work very
well then somehow, I think I also needed a kind of common subexpression
elimination for it. But in principle it was possible.

I now wonder why I did not finish the paper. Right now I do not have the time
for it...
 
Back to the thesis.

-- 
Pascal Serrarens
Concurrent Clean developer
University of Nijmegen, NL