[clean-list] Re:Clean Wish List: My old major wish...

Wed, 16 Oct 2002 22:59:15 +0200

Erik wrote:
>Valery wrote:
>> However, I think there should be clear understanding that both Pentium
>> and PCs based on it (as a excellent representative of nowadays
>> architectures) are far from being a natural solution concerning
>> hardware architecture for the graph rewriting.
>> ...especially parallel one.
>
>Functional programming developments once concentrated also on devising
>totally new hardware configurations. That stopped when important
>breakthroughs took place in compiler design, was it in the 1980's? 
>They opened
>up the perspective of acceptable preformance for functional language
>programs on ordinary hardware.

It is true that somewhere towards the end of the 1980's, research 
groups started working on serious compilers for functional languages 
for common processor architectures. It turned out that research 
groups building specialised hardware could not compete with large 
companies like Intel, building ever faster general-purpose 
processors. It was kind of a pragmatic decision.

This does not mean that it is technically impossible to build 
hardware that would perform faster graph rewriting than any Pentium 
or PowerPC today. The academic research groups simply do not have the 
resources.

>What we now need is practical solutions, so they should be based on available
>and afordable hardware. I think the Clean team are doing a good job. When the
>need for implicit parallellism becomes ever more clear, and one bright
>researcher finds a way to implement it, we'll get it. Until then, 
>we'll dream on...

I have done little a bit of searching on the Pentium atomic XCHG 
(exchange) instruction and found no definite answer on its costs, let 
alone its costs relative to graph-rewriting. I have seen game writers 
avoid it, and multi-processing groups use it. I have seen remarks 
that it does not lock the bus on newer processors for cached memory 
locations.

Apart from the costs, it seems rather trivial to lock nodes with this 
instruction. So what if it slows down Clean by 10 percent?

Valery wrote:
>I'd postpone discussing hardware issue up to the moment when
>we could se (invisible) forking implemented in Clean. Before this 
>there's no much sense discussing hardware.

The reality of today is that the Intel platform has become the main 
platform for Clean. I do not see how one can see forking implemented 
in Clean without discussing the Intel hardware.

Apart from that, I use Macintoshes, so this reality does bug me a 
bit. Apparently I do not have {P} and {I} on my PowerPC because of 
bad Pentium design.

>What's the problem? 
>
>1. Just deliver *every* graph reduction step to a new CPU and/or thread 
> and/or process.
>
>2. Then you'll see vastly degrading of your performance on classical 
>machines, 
> than you could ask for new hardware

You could, but who is going to give this new machine to you? It is 
more likely that the rest of the world is going to say "see, these 
functional languages will never be efficient enough".

Apart from that, people probably want a machine that can run 
different kinds of languages well.

>3. And meanwhile you could make some optimizers which could enable
> delivering not every graph reduction step to another CPU but only needed
> (like thos compile time optimizers, trying to use registers instead of 
> memory as much as possibly)

Or you could start without parallellism and let the programmer 
introduce some. This is what {P} and {I} were about.

regards,
Marco