[clean-list] Matrix timings

John van Groningen johnvg@cs.kun.nl
Fri, 2 Nov 2001 17:57:26 +0100


Sven-Bodo Scholz wrote:
>...
>Clean-solution: John's solution posted Nov 2000 (complete version at the end of this posting)
>
>all timings on a SUN UltraSPARC 450MHz:
>
>size     Clean-time (s)        SAC-time (s)
>
>300x300      0.8                  0.3
>1000x1000   63.5                 33.4
>...

I have written a faster matrix multiplication in Clean. I have included the code at the end of the message.

The results on a 333 Mhz UltraSPARC: (1000 x 1000)

old version: 77 seconds
new version: 37 seconds

The result on a 450 Mhz PowerPC G4: (1000 x 1000)

old version: 52 seconds
new version: 19 seconds
new version: 13 seconds with code generator that generates fmadd instructions.

>letting the SAC compiler exploit implicit concurrency via POSIX-threads
>(same source code / compiler option "-mt" enabled)
>yields on the same (4 procs) machine (size 1000x1000):
>
># of threads used   SAC-time (s)   speedup
>
>      1               33.5            1
>      2               19.7            1.7
>      3               13.8            2.4
>      4               12.5            2.7

So a single 450 Mhz PowerPC G4 using Clean now multiplies a 1000x1000 matrix nearly as fast as four 450 Mhz SPARC processors with SAC.

Regards,

John van Groningen



The program:



module matrix;

import StdEnv;

:: Matrix :== {#{#Real}};

mul_matrix a b n r
	= m1 0 n 0 n 0 n 0 n a b r;

Threshold :== 32;

m1 :: !Int !Int !Int !Int !Int !Int !Int !Int !Matrix !Matrix !*{#*{#Real}} -> *{#*{#Real}};
m1 ib ie jb je icb ice jcb jce a b r
	| ie-ib<=Threshold
		= m3 ib ie jb je icb ice jcb jce a b r
		# im = (ib+ie+1)>>1;
		# jm = (jb+je+1)>>1;
		# r = m2 ib im jb jm icb ice jcb jce a b r;
		# r = m2 ib im jm je icb ice jcb jce a b r;
		# r = m2 im ie jb jm icb ice jcb jce a b r;
		# r = m2 im ie jm je icb ice jcb jce a b r;
		= r;

m2 :: !Int !Int !Int !Int !Int !Int !Int !Int !Matrix !Matrix !*{#*{#Real}} -> *{#*{#Real}};
m2 ib ie jb je icb ice jcb jce a b r
	| ice-icb<=Threshold
		= m3 ib ie jb je icb ice jcb jce a b r
		# icm = (icb+ice+1)>>1;
		# jcm = (jcb+jce+1)>>1;
		# r = m1 ib ie jb je icb icm jcb jcm a b r;
		# r = m1 ib ie jb je icm ice jcm jce  a b r;
		= r;

m3 :: !Int !Int !Int !Int !Int !Int !Int !Int !Matrix !Matrix !*{#*{#Real}} -> *{#*{#Real}};
m3 ib ie jb je icb ice jcb jce a b r
	= l1 ib jb je ie icb ice r;
{
	l1 i jb je ie icb ice r
		| i+1<ie
			# r=l2 jb icb ice r;
			= l1 (i+2) jb je ie icb ice r;
			with {
				row1=a.[i];
				row2=a.[i+1];
				
				l2 j icb ice r
					| j+1<je
						# (v1,v2,v3,v4) = mul_vector_row22 icb jcb ice 0.0 0.0 0.0 0.0 row1 row2 b.[j] b.[j+1];
						# i1=i+1;
						# j1=j+1;
						# (s1,r)=r![i,j];
						# (s2,r)=r![i1,j];
						# (s3,r)=r![i,j1];
						# (s4,r)=r![i1,j1];
						# r={r & [i,j]=s1+v1,[i1,j]=s2+v2,[i,j1]=s3+v3,[i1,j1]=s4+v4 };
						= l2 (j+2) icb ice r;
					| j<je
						# (v1,v2) = mul_vector_row21 icb jcb ice 0.0 0.0 row1 row2 b.[j];
						# (s1,r)=r![i,j];
						# (s2,r)=r![i+1,j];
						= {r & [i,j]=s1+v1,[i+1,j]=s2+v2};
						= r;
				}
		| i<ie
			# r=l2 jb icb ice r;
			= l1 (i+2) jb je ie icb ice r;
			with {
				row1=a.[i];
				
				l2 j icb ice r
					| j+1<je
						# (v1,v2) = mul_vector_row12 icb jcb ice 0.0 0.0 row1 b.[j] b.[j+1];
						# (s1,r)=r![i,j];
						# (s2,r)=r![i,j+1];
						# r={r & [i,j]=s1+v1,[i,j+1]=s2+v2}
						= l2 (j+2) icb ice r;
					| j+1<je
						# v1 = mul_vector_row11 icb jcb ice 0.0 row1 b.[j];
						# (s1,r)=r![i,j];
						= {r & [i,j]=s1+v1};
						= r;
				}
			= r;
}

mul_vector_row22 :: !Int !Int !Int !Real !Real !Real !Real !{#Real} !{#Real} !{#Real} !{#Real} -> (!Real,!Real,!Real,!Real);
mul_vector_row22 ic jc n2 v1 v2 v3 v4 row1 row2 column1 column2
	| ic<n2
		# r1_k=row1.[ic];
		# r2_k=row2.[ic];
		# c1_k=column1.[jc];
		# c2_k=column2.[jc];
		# v1=v1+r1_k*c1_k;
		# v2=v2+r2_k*c1_k;
		# v3=v3+r1_k*c2_k;
		# v4=v4+r2_k*c2_k;
		= mul_vector_row22 (ic+1) (jc+1) n2 v1 v2 v3 v4 row1 row2 column1 column2;
		= (v1,v2,v3,v4);

mul_vector_row21 :: !Int !Int !Int !Real !Real !{#Real} !{#Real} !{#Real} -> (!Real,!Real);
mul_vector_row21 ic jc n2 v1 v2 row1 row2 column1
	| ic<n2
		# c1_k=column1.[jc];
		# v1=v1+row1.[ic]*c1_k;
		# v2=v2+row2.[ic]*c1_k;
		= mul_vector_row21 (ic+1) (jc+1) n2 v1 v2 row1 row2 column1;
		= (v1,v2);

mul_vector_row12 :: !Int !Int !Int !Real !Real !{#Real} !{#Real} !{#Real} -> (!Real,!Real);
mul_vector_row12 ic jc n2 v1 v2 row1 column1 column2
	| ic<n2
		# r1_k=row1.[ic];
		# v1=v1+r1_k*column1.[jc];
		# v2=v2+r1_k*column2.[jc];
		= mul_vector_row12 (ic+1) (jc+1) n2 v1 v2 row1 column1 column2;
		= (v1,v2);

mul_vector_row11 :: !Int !Int !Int !Real !{#Real} !{#Real} -> Real;
mul_vector_row11 ic jc n2 v1 row1 column1
	| ic<n2
		# v1=v1+row1.[ic]*column1.[jc];
		= mul_vector_row11 (ic+1) (jc+1) n2 v1 row1 column1;
		= v1;

transpose :: !Matrix !Int -> Matrix;
transpose a n = {{a.[j,i] \\ j<-[0..n-1]} \\ i<-[0..n-1]};

matrix n = { createArray n 1.0 \\ i<-[0..n-1] };

nul_matrix n = { createArray n 0.0 \\ i<-[0..n-1] };

Start
	# n = 1000;
	# m = mul_matrix (matrix n) (transpose (matrix n) n) n (nul_matrix n);
	= m.[0,0];