[clean-list] The Clas library
    Siegfried Gonzi 
    siegfried.gonzi@kfunigraz.ac.at
    Wed, 24 Oct 2001 11:01:21 +0200
    
    
  
John van Groningen wrote:
> You can make the forward- and backward substitutions faster by replacing
> the function 'dot' in Clas1.icl by:
> 
> dot :: !.Vector !.Vector -> Real
> dot x y
>         #! s = min (size x) (size y)
>         # r = dot_i 0 0.0
>                 with
>                 dot_i :: !Int !Real -> Real
>                 dot_i i acc
>                         | i<>s
>                                 = dot_i (inc i) (acc + (x.[i] * y.[i]))
>                                 = acc
>         = r
> 
> and adding the ! for the second argument in the type of 'dot' in Clas1.dcl.
> 
> On my Macintosh your program executes about 1.6 as fast after this modification.
On my old Macintosh there isn't a remarkable performance boost. I am
sure again there is some limiting memory access complication.
I did dig-up my Matlab Student version 5.0 and installed it on my Mac.
The Student version is limited to 128x128 array dimensions. But solving
for a 128x128 random number array and a 128x128 known array takes
0.7sec. 
And to my great surprise Clean takes also 0.7sec for a 128x128 array but
I got this time interestingly only 5 times and in average (more than 80%
of the trials) the time is at 1.8sec. By the way, Yorick takes for
solving a 128x128 array also 0.7sec.
[It is often said Matlab is slow; but I have to beg differ; Matlab is
only slow when one is using loops; I know only two interpreted languages
which are very, very fast at loops: IDL and Yorick].
On the old  Suns there are also no performance boost when I change the
"dot" code. And on the newer ones there it is even more strange: For a
500x500 array Clean takes 17sec and Yorick takes 4sec. But I do not not
know whether the fact that there are no unboxed arrays on Unix available
would change the execution time noticeably.
Why I am watching at execution times is at follows. I want to do a
Kriging interpolation on a 2 dimensional grid. And for a lets say
200x200 array A:
 A X = Y 
I have to solve 40000 vectors (which are stored as rows in X).
> By using the 'fmadd' instruction, instead of 'fmul' and 'fadd', the program
> executes even faster, about 20 percent. Clean 1.3.3 does not generate these
> instructions but the current development version of the compiler can
> (but this is not yet supported by the IDE).
I am quite sure Matlab and also Yorick uses some tuned code. But I
couldn't figure it out yet; or they access memory in a different order.
S. Gonzi