[clean-list] Matrix timings
John van Groningen
johnvg@cs.kun.nl
Fri, 2 Nov 2001 17:57:26 +0100
Sven-Bodo Scholz wrote:
>...
>Clean-solution: John's solution posted Nov 2000 (complete version at the end of this posting)
>
>all timings on a SUN UltraSPARC 450MHz:
>
>size Clean-time (s) SAC-time (s)
>
>300x300 0.8 0.3
>1000x1000 63.5 33.4
>...
I have written a faster matrix multiplication in Clean. I have included the code at the end of the message.
The results on a 333 Mhz UltraSPARC: (1000 x 1000)
old version: 77 seconds
new version: 37 seconds
The result on a 450 Mhz PowerPC G4: (1000 x 1000)
old version: 52 seconds
new version: 19 seconds
new version: 13 seconds with code generator that generates fmadd instructions.
>letting the SAC compiler exploit implicit concurrency via POSIX-threads
>(same source code / compiler option "-mt" enabled)
>yields on the same (4 procs) machine (size 1000x1000):
>
># of threads used SAC-time (s) speedup
>
> 1 33.5 1
> 2 19.7 1.7
> 3 13.8 2.4
> 4 12.5 2.7
So a single 450 Mhz PowerPC G4 using Clean now multiplies a 1000x1000 matrix nearly as fast as four 450 Mhz SPARC processors with SAC.
Regards,
John van Groningen
The program:
module matrix;
import StdEnv;
:: Matrix :== {#{#Real}};
mul_matrix a b n r
= m1 0 n 0 n 0 n 0 n a b r;
Threshold :== 32;
m1 :: !Int !Int !Int !Int !Int !Int !Int !Int !Matrix !Matrix !*{#*{#Real}} -> *{#*{#Real}};
m1 ib ie jb je icb ice jcb jce a b r
| ie-ib<=Threshold
= m3 ib ie jb je icb ice jcb jce a b r
# im = (ib+ie+1)>>1;
# jm = (jb+je+1)>>1;
# r = m2 ib im jb jm icb ice jcb jce a b r;
# r = m2 ib im jm je icb ice jcb jce a b r;
# r = m2 im ie jb jm icb ice jcb jce a b r;
# r = m2 im ie jm je icb ice jcb jce a b r;
= r;
m2 :: !Int !Int !Int !Int !Int !Int !Int !Int !Matrix !Matrix !*{#*{#Real}} -> *{#*{#Real}};
m2 ib ie jb je icb ice jcb jce a b r
| ice-icb<=Threshold
= m3 ib ie jb je icb ice jcb jce a b r
# icm = (icb+ice+1)>>1;
# jcm = (jcb+jce+1)>>1;
# r = m1 ib ie jb je icb icm jcb jcm a b r;
# r = m1 ib ie jb je icm ice jcm jce a b r;
= r;
m3 :: !Int !Int !Int !Int !Int !Int !Int !Int !Matrix !Matrix !*{#*{#Real}} -> *{#*{#Real}};
m3 ib ie jb je icb ice jcb jce a b r
= l1 ib jb je ie icb ice r;
{
l1 i jb je ie icb ice r
| i+1<ie
# r=l2 jb icb ice r;
= l1 (i+2) jb je ie icb ice r;
with {
row1=a.[i];
row2=a.[i+1];
l2 j icb ice r
| j+1<je
# (v1,v2,v3,v4) = mul_vector_row22 icb jcb ice 0.0 0.0 0.0 0.0 row1 row2 b.[j] b.[j+1];
# i1=i+1;
# j1=j+1;
# (s1,r)=r![i,j];
# (s2,r)=r![i1,j];
# (s3,r)=r![i,j1];
# (s4,r)=r![i1,j1];
# r={r & [i,j]=s1+v1,[i1,j]=s2+v2,[i,j1]=s3+v3,[i1,j1]=s4+v4 };
= l2 (j+2) icb ice r;
| j<je
# (v1,v2) = mul_vector_row21 icb jcb ice 0.0 0.0 row1 row2 b.[j];
# (s1,r)=r![i,j];
# (s2,r)=r![i+1,j];
= {r & [i,j]=s1+v1,[i+1,j]=s2+v2};
= r;
}
| i<ie
# r=l2 jb icb ice r;
= l1 (i+2) jb je ie icb ice r;
with {
row1=a.[i];
l2 j icb ice r
| j+1<je
# (v1,v2) = mul_vector_row12 icb jcb ice 0.0 0.0 row1 b.[j] b.[j+1];
# (s1,r)=r![i,j];
# (s2,r)=r![i,j+1];
# r={r & [i,j]=s1+v1,[i,j+1]=s2+v2}
= l2 (j+2) icb ice r;
| j+1<je
# v1 = mul_vector_row11 icb jcb ice 0.0 row1 b.[j];
# (s1,r)=r![i,j];
= {r & [i,j]=s1+v1};
= r;
}
= r;
}
mul_vector_row22 :: !Int !Int !Int !Real !Real !Real !Real !{#Real} !{#Real} !{#Real} !{#Real} -> (!Real,!Real,!Real,!Real);
mul_vector_row22 ic jc n2 v1 v2 v3 v4 row1 row2 column1 column2
| ic<n2
# r1_k=row1.[ic];
# r2_k=row2.[ic];
# c1_k=column1.[jc];
# c2_k=column2.[jc];
# v1=v1+r1_k*c1_k;
# v2=v2+r2_k*c1_k;
# v3=v3+r1_k*c2_k;
# v4=v4+r2_k*c2_k;
= mul_vector_row22 (ic+1) (jc+1) n2 v1 v2 v3 v4 row1 row2 column1 column2;
= (v1,v2,v3,v4);
mul_vector_row21 :: !Int !Int !Int !Real !Real !{#Real} !{#Real} !{#Real} -> (!Real,!Real);
mul_vector_row21 ic jc n2 v1 v2 row1 row2 column1
| ic<n2
# c1_k=column1.[jc];
# v1=v1+row1.[ic]*c1_k;
# v2=v2+row2.[ic]*c1_k;
= mul_vector_row21 (ic+1) (jc+1) n2 v1 v2 row1 row2 column1;
= (v1,v2);
mul_vector_row12 :: !Int !Int !Int !Real !Real !{#Real} !{#Real} !{#Real} -> (!Real,!Real);
mul_vector_row12 ic jc n2 v1 v2 row1 column1 column2
| ic<n2
# r1_k=row1.[ic];
# v1=v1+r1_k*column1.[jc];
# v2=v2+r1_k*column2.[jc];
= mul_vector_row12 (ic+1) (jc+1) n2 v1 v2 row1 column1 column2;
= (v1,v2);
mul_vector_row11 :: !Int !Int !Int !Real !{#Real} !{#Real} -> Real;
mul_vector_row11 ic jc n2 v1 row1 column1
| ic<n2
# v1=v1+row1.[ic]*column1.[jc];
= mul_vector_row11 (ic+1) (jc+1) n2 v1 row1 column1;
= v1;
transpose :: !Matrix !Int -> Matrix;
transpose a n = {{a.[j,i] \\ j<-[0..n-1]} \\ i<-[0..n-1]};
matrix n = { createArray n 1.0 \\ i<-[0..n-1] };
nul_matrix n = { createArray n 0.0 \\ i<-[0..n-1] };
Start
# n = 1000;
# m = mul_matrix (matrix n) (transpose (matrix n) n) n (nul_matrix n);
= m.[0,0];