# [clean-list] Matrix timings

Sven-Bodo Scholz sbs@informatik.uni-kiel.de
Wed, 31 Oct 2001 15:14:59 +0100

```On Wed, Oct 31, 2001 at 08:33:53AM +0100, Siegfried Gonzi wrote:
>
> >I don't have the Clean timings handy for linear system solution. But
> >here are the timings for matrix-matrix multiplication instead:
>
> >Clean's best timings for 300x300 was 1.98seconds vs. 0.16seconds for
> >Scilab (so maybe that ATLAS BLAS got linked in after all). The C code
> >(on top of the Boehm gc) timing was 0.85 seconds. So the Clean code was
> >about 2.3 times slower than "gcc -O3 -funroll-all-loops" in this case
> >(which was not quite fair to C; 2.5 was more likely to be the case).
>
> Be aware that in case of you are using Zoerner's
> matrix-matrix-multiply-algorithm that there has been posted (on this
> list by Groningen) a faster one (factor 1.5 or so), one or two years
> ago.
>
> I can remember that on my Macintosh a 512x512
> matrix-matrix-multiplication takes 40sec (in the best case of the
> measurements) to 50sec. An equivalent C code takes the same time. But
> here then memory access (the speed) is a limiting factor.

talking about matrix-multiplication in different languages.....
Here, a comparison SAC (Single Assignment C) vs. Clean:

SAC-solution: (complete version at the end of this posting)

double[*] matmul( double[*] A, double[*] B)
{
BT = transpose( B);

C = with( . <= [i,j] <= .)
genarray( [ shape(A)[[0]], shape(B)[[1]] ], sum( A[[i]] * BT[[j]]));
return( C);
}

Clean-solution: John's solution posted Nov 2000 (complete version at the end of this posting)

all timings on a SUN UltraSPARC 450MHz:

size     Clean-time (s)        SAC-time (s)

300x300      0.8                  0.3
1000x1000   63.5                 33.4

>
> Lest play hardball: Newer processors include parallelism in the
> processor itself. The new G4 processor highly parallise tasks. C and
> Fortran can always exploit this feature. I once red an article by a NASA
> magnetohydrodynamics-guy about G4 parallelism and how one can exploit it
> with C or Fortran (as long a the compiler-vendor supports it).

letting the SAC compiler exploit implicit concurrency via POSIX-threads
(same source code / compiler option "-mt" enabled)
yields on the same (4 procs) machine (size 1000x1000):

# of threads used   SAC-time (s)   speedup

1               33.5            1
2               19.7            1.7
3               13.8            2.4
4               12.5            2.7

Sorry if this is leading slightly out of track, but I just could not resist since
- usually - we do not consider matrix multiplication to be one of the strongest
examples in favor of SAC 8-)))

Cheers,
Sven-Bodo

Full Clean version:
-------------------------------------------------------------------------------------
module matrix

import StdEnv
// import Clas3

transpose_ :: !{#.{#Real}} !Int -> {#.{#Real}}
transpose_ a n = { { a.[j,i] \\ j<-[0..n-1]} \\ i<-[0..n-1]}

multiply_matrix_t :: !{#.{#Real}} !{#.{#Real}} !Int -> {#.{#Real}}
multiply_matrix_t a b n
= { let
row = a.[i];
in {
let
mul_vector_row :: !Int !Real !{#Real} !{#Real}-> Real
mul_vector_row k v row col
| k>=n
= v
= mul_vector_row (k+1) (v+row.[k]*col.[k]) row col
in
mul_vector_row 0 0.0 row b.[j] \\ j<-[0..n-1]} \\ i<-[0..n-1]}

matrix :: !Int -> {#.{#Real}}
matrix n = {createArray n 1.0 \\ i<-[0..n-1]}

repeat_mul c
| c==0
= 0
# n = 300;
| size (multiply_matrix_t (matrix n) (transpose_ (matrix n) n) n)>=0
//      | size ((matrix n) *** (matrix n))>=0
= repeat_mul (c-1);

Start
= repeat_mul 1;
-------------------------------------------------------------------------------------
Full SAC version:
-------------------------------------------------------------------------------------
#define SIZE 1000

import Array:all;
import StdIO:all;

inline double[.,.] transpose( double[.,.] A)
{
res = with( . <= [i,j] <= .) {
} genarray( [shape(A)[[1]], shape(A)[[0]]], A[[j,i]]);
return( res);
}

double[*] matmul( double[*] A, double[*] B)
{
BT = transpose( B);

C = with( . <= iv=[i,j] <= .)
genarray( [ shape(A)[[0]], shape(B)[[1]] ], sum( A[[i]] * BT[[j]]));
return( C);
}

int main()
{
A = genarray( [SIZE,SIZE], 1.0);

B = matmul( A, A);

printf( "B[0,0] = %f\n", B[0,0]);

return(0);
}

-------------------------------------------------------------------------------------

--
Sven-Bodo Scholz                        University of Kiel
email: sbs@informatik.uni-kiel.de       Department of Computer Science
http://www.informatik.uni-kiel.de/~sbs/ Herman-Rodewald-Strasse 3, 24118 Kiel
Phone: +49-431-880-4482              	Germany

```