[clean-list] Matrix-Matrix-Multiplication and Cleans memory management

John van Groningen johnvg@cs.kun.nl
Fri, 3 Nov 2000 13:04:29 +0000


Dear Siegfried,

You wrote:
>...
>The Console output is set to 'not'. The heap memory size is around
>30000K.
>My Computer is a Performa 5300 (Power PC RISC 603e with 48MB on RAM).
>Virtual memory of the MacOS 8.6 was set to 49MB (48+1MB).
>
>If I start the above program I have to wait (in the best case) 725sec
>(the garbage-collection is about 5sec).
>
>If I start the above compiled (compiled on the Performa as stand-alone)
> program as stand-alone on my Powerbook 5300
>(RISC 603e 100MHz)  with 24MB on RAM and virtual-memory, of the MACOS
>8.6, set to
>50MB I have to wait (no typing error): 1184sec + 525sec(for garbage-
>collection). The memory of the program was set to 40MB (30MB is min as
>with
>compilation on the Performa).
>
>What is *wrong* with the memory management? When I start the programm on
>the Powerbook the disc is working loud and loud and loud.

I don't think anything is wrong with the memory management.

>So someone can now say the speed degradation is due to the lack of
>enough
>build-in memory in the Powerbook (the Powerbook has 24MB compared to
>48MB
>with the Performa). But I cannot believe that.

I do think that this is the reason for the degradation in performance.
The program uses more than 25m Mb of memory, which is more
than the amount of real memory in your computer.

>...
>And, *now*, when I did the same on my Powerbook, Yorick (memory was set
>to
>40MB) *also* only needed
>500sec for the matrix-multiplication. Only the first and last minutes I
>heard working the disc (Clean let working the disc all the time).
>
>Something is strange with the memory management. If I calculate a matrix
>multiplication with only 256x256 elements Clean tooks the same time
>(10sec) on
>both computers (Performa and Powerbook). But when it comes to large
>arrays...
>
>Why can Yorick manage memory so much better? Yorick itself is only a
>interpreter (but Fast Fourier and all array manipulations are build in
>functions, and they have compiled speed). I want not discuss here which
>is
>better Yorick or Clean, but I cannot understand why Yorick manages
>memory
>better, because Yorick only "interprets" code and Clean compiles code to
>stand-alone native PowerPC code.

The matrix multiplication function of the CLAS library uses 'transpose'
twice during the multiplication. This appears to increase the memory 
use of the program.
I have written a matrix multiplication that uses tranpose once.
On my PowerMac G4 this version is 1.5 times as fast as the matrix
multiplication in CLAS 0.7 (about 1 minute instead of 1.5 minutes).
It also allocates less memory and probably uses less memory.
I have included this program at the end of this e-mail message.

I think that this multiplication can be made at least twice as fast by
using blocking.

Regards,

John van Groningen

------------------------------------------------------------------------------
module matrix

import StdEnv
import Clas3

transpose_ :: !{#.{#Real}} !Int -> {#.{#Real}}
transpose_ a n = { { a.[j,i] \\ j<-[0..n-1]} \\ i<-[0..n-1]}

multiply_matrix_t :: !{#.{#Real}} !{#.{#Real}} !Int -> {#.{#Real}}
multiply_matrix_t a b n
    = { let
           row = a.[i];
        in {
            let
                mul_vector_row :: !Int !Real !{#Real} !{#Real}-> Real
                mul_vector_row k v row col
                    | k>=n
                        = v
                        = mul_vector_row (k+1) (v+row.[k]*col.[k]) row col
            in
                mul_vector_row 0 0.0 row b.[j] \\ j<-[0..n-1]} \\ i<-[0..n-1]}

matrix :: !Int -> {#.{#Real}}
matrix n = {createArray n 1.0 \\ i<-[0..n-1]}

repeat_mul c
    | c==0
        = 0
        # n = 1024;
        | size (multiply_matrix_t (matrix n) (transpose_ (matrix n) n) n)>=0
//      | size ((matrix n) *** (matrix n))>=0
            = repeat_mul (c-1);

Start
    = repeat_mul 1;