[clean-list] Matrix timings

Wed, 31 Oct 2001 15:14:59 +0100

On Wed, Oct 31, 2001 at 08:33:53AM +0100, Siegfried Gonzi wrote:
> 
> >I don't have the Clean timings handy for linear system solution. But 
> >here are the timings for matrix-matrix multiplication instead:
> 
> >Clean's best timings for 300x300 was 1.98seconds vs. 0.16seconds for 
> >Scilab (so maybe that ATLAS BLAS got linked in after all). The C code 
> >(on top of the Boehm gc) timing was 0.85 seconds. So the Clean code was 
> >about 2.3 times slower than "gcc -O3 -funroll-all-loops" in this case 
> >(which was not quite fair to C; 2.5 was more likely to be the case).
> 
> Be aware that in case of you are using Zoerner's
> matrix-matrix-multiply-algorithm that there has been posted (on this
> list by Groningen) a faster one (factor 1.5 or so), one or two years
> ago.
> 
> I can remember that on my Macintosh a 512x512
> matrix-matrix-multiplication takes 40sec (in the best case of the
> measurements) to 50sec. An equivalent C code takes the same time. But
> here then memory access (the speed) is a limiting factor.

talking about matrix-multiplication in different languages.....
Here, a comparison SAC (Single Assignment C) vs. Clean:

SAC-solution: (complete version at the end of this posting)

double[*] matmul( double[*] A, double[*] B)
{
  BT = transpose( B);

  C = with( . <= [i,j] <= .)
      genarray( [ shape(A)[[0]], shape(B)[[1]] ], sum( A[[i]] * BT[[j]]));
  return( C);
}

Clean-solution: John's solution posted Nov 2000 (complete version at the end of this posting)

all timings on a SUN UltraSPARC 450MHz:

size     Clean-time (s)        SAC-time (s)

300x300      0.8                  0.3
1000x1000   63.5                 33.4

> 
> Lest play hardball: Newer processors include parallelism in the
> processor itself. The new G4 processor highly parallise tasks. C and
> Fortran can always exploit this feature. I once red an article by a NASA
> magnetohydrodynamics-guy about G4 parallelism and how one can exploit it
> with C or Fortran (as long a the compiler-vendor supports it).

letting the SAC compiler exploit implicit concurrency via POSIX-threads
(same source code / compiler option "-mt" enabled)
yields on the same (4 procs) machine (size 1000x1000):

# of threads used   SAC-time (s)   speedup

      1               33.5            1
      2               19.7            1.7
      3               13.8            2.4
      4               12.5            2.7

Sorry if this is leading slightly out of track, but I just could not resist since
- usually - we do not consider matrix multiplication to be one of the strongest
examples in favor of SAC 8-)))

Cheers,
   Sven-Bodo

Full Clean version:
-------------------------------------------------------------------------------------
module matrix

import StdEnv
// import Clas3

transpose_ :: !{#.{#Real}} !Int -> {#.{#Real}}
transpose_ a n = { { a.[j,i] \\ j<-[0..n-1]} \\ i<-[0..n-1]}

multiply_matrix_t :: !{#.{#Real}} !{#.{#Real}} !Int -> {#.{#Real}}
multiply_matrix_t a b n
    = { let
           row = a.[i];
        in {
            let
                mul_vector_row :: !Int !Real !{#Real} !{#Real}-> Real
                mul_vector_row k v row col
                    | k>=n
                        = v
                        = mul_vector_row (k+1) (v+row.[k]*col.[k]) row col
            in
                mul_vector_row 0 0.0 row b.[j] \\ j<-[0..n-1]} \\ i<-[0..n-1]}

matrix :: !Int -> {#.{#Real}}
matrix n = {createArray n 1.0 \\ i<-[0..n-1]}

repeat_mul c
    | c==0
        = 0
        # n = 300;
        | size (multiply_matrix_t (matrix n) (transpose_ (matrix n) n) n)>=0
//      | size ((matrix n) *** (matrix n))>=0
            = repeat_mul (c-1);

Start
    = repeat_mul 1;
-------------------------------------------------------------------------------------
Full SAC version:
-------------------------------------------------------------------------------------
#define SIZE 1000

import Array:all;
import StdIO:all;

inline double[.,.] transpose( double[.,.] A)
{
  res = with( . <= [i,j] <= .) {
       } genarray( [shape(A)[[1]], shape(A)[[0]]], A[[j,i]]);
  return( res);
}

double[*] matmul( double[*] A, double[*] B)
{
  BT = transpose( B);

  C = with( . <= iv=[i,j] <= .)
      genarray( [ shape(A)[[0]], shape(B)[[1]] ], sum( A[[i]] * BT[[j]]));
  return( C);
}

int main()
{
  A = genarray( [SIZE,SIZE], 1.0);

  B = matmul( A, A);

  printf( "B[0,0] = %f\n", B[0,0]);

  return(0);
}

-------------------------------------------------------------------------------------

-- 
Sven-Bodo Scholz                        University of Kiel            
email: sbs@informatik.uni-kiel.de       Department of Computer Science
http://www.informatik.uni-kiel.de/~sbs/ Herman-Rodewald-Strasse 3, 24118 Kiel  
Phone: +49-431-880-4482              	Germany