[clean-list] Before I die (or the Clas library)

Sun, 28 Oct 2001 11:38:55 +0100

Enclosed a speed improved version for solving linear systems. 
The lu-factorisation is the same as the one in the Clas-library
from Thorsten Zoerner. J.v. Groningen posted a few days ago a tuned
version for calculating the dot product. But there is better way to
improve the performance: it is the forward- and back-substitution
itself.

The original Clas-library calculates for every row in both of the back-
and forward-substitution codes always the dot-product traversing all
elements. But you can save multiplications, when you start (at the back-
and forward-substitution) with dot-product 1, then 2, then 3 and so on; 
walking up this way until one is at the dimension of the array. This
saves
quite a lot computation compared to calculating every time the full dot-
product.

The code is based on a version retrieved from the Numerical-Recipes.
This
book often gets very bad comments, but I think for every day tasks it is
quite
okay and reliable. Be aware: Numerical Recipes C code is completely
insane,
because they use there 1-offset (this can be confusing, if you want to
pass a
pointer). In Clean there is zero-offset (I did take this into account).
It is not exactly the same version as the Numerical Recipes one; they
are
doing the pivoting in companion with forward- and back-substitution. My
code
does it seperate. There would be a critical performance difference, if
one
has to deal with arrays whith many zeros (then you should adapt the
Numerical
Recipes code; the modification should be easy).

Before we turn to the timings: Caveat, Caveat, screaming:

Clean  its strict (*) error messages are working very hard against 
the programmer. Using arrays in Clean is similar as one is hoping for a
victory 
in lottery. A Russian-roulette would be more pleasing to experience.
There 
are so many different array versions. Why is it not possible to have
(only 
boxed arrays in this case; except that the Unix Clean version is in debt
to 
strict arrays but in general it is not clear  why it should be not
enough 
to have only one version: unboxed):

Matrix: {#{#Real,Int}}
Vector: {#Real,Int} 

Matrix unique:  *{#{#Real,Int}}
Vector unique: *{#Real,Int}

but no, the reallity is:

Matrix: {{#Real,Int}}, {#{#Real,Int}}
Vector: {#Real,Int}

Matrix unique: *{# *{#Real,Int}}, *{*{#Real,Int}}, {# *{#Real,Int}}, *{#
{#Real,Int}}
Vector unique: *{#Real,Int}, 

As an irrelevant person I do not really see the need for that. Sorry, my
name is not
Mr. Siegfried Compiler.
We know all this sign: "#". This mark is depicted by arrested guys in
Hollywood films in
order to show the sentinentels that the day will come and the bad guys
may leave the cage.
You can also paint this sign for the number of users which are
intimidated by Clean by means
of array types.

But the array concept itself is a pleasure. Especially the access to the
rows without any need
of pointers or copying.

Timings (on a Macintosh 100MHz, 603e processor, 64MB RAM, 16K L1, 256K
L2):

Beeing anti-verbose:

Roughly speaking: takes for a 300x300 array and a vector set of array
dimension 300x300
about 20sec.

The comparable Yorick execution time is about 15sec.

This sounds now acceptable. The Yorick version is based on a compiled
lin-algebra
routine and underlines the fact that Clean is in average 1.4 time slower
as a
comparable C version. This IS really very astonishing and impressive.
The next 
time when I am drunken I will laugh and turn all Python-programmers 
into 
ridiculousness for their behavior to call C functions when they are
greeding for speed.

But it is highly required to bug-fix the Unix Clean version as fast as
possible. THIS 
should be a high priority on your bug-list.
With my new version the difference to Yorick is about a factor 2 and
surely due to
strict/unboxed arrays on Unix.

Digression: It will be no harm, if you consider to make a demarcation
line between Clean 1.3.3 numerically intense tasks and Clean 2.0 for the
other topics. You can take the idea similar to C for numerics and C++
for so called higher order task (but only in the C++ eye of the
beholder).

Thank you Mr. Gonzi for contributions to a better world.
S. Gonzi (lined-up with the Red-White-Red flag)

//The program comes with no warranty
//Due to Sunday I decided this program should be freeware
//The program is based on the Clas-library from Thorsten
//Zoerner; you get it from the Clean homepage
//
//If you are an Unix user you must change all occurences
//of '#' to '!' in the type definitions in the Clas-library 
//due to an unboxed array bug on the Suns
//The back- and forward-substitution algorithm is an
//implementation by myself after an idea based on the code
//in the Numerical Recipes
//
//For generating random numbers you must download the
//MersenneTwister library from the Clean homepage (installation
//then is straightforward)
//
//I DID NOT MAKE ANY SUPERIOR ERROR CHECKS. I am not your mother,
//take care what you do and feed the program not with strings,
globular-clusters,
//nuts or your girlfriend when the program is actually expecting
//properly dimensionised arrays
//
//Tested on a Macintosh (Clean 1.3.3) and a Sun workstation (Clean
1.3.3)
//
// Siegfried Gonzi 2001

module linAlg
import StdEnv,Clas3, MersenneTwister

// If you work on the Sun, please change it
// to strict versions: ! instead of #
:: Vec :== {#Real}
:: VecT :== {#Int}
:: MatS :== {# .{#Real}}
:: Mat :== {# {#Real}}

//Start:: !Real
Start:: !Mat
//Start = TakeOne( solveLSPartPivG  a b )   		
Start = solveLSPartPivG a b
where
	a:: *MatS
	a =
{{0.0,23.45,2.34,0.623},{3.45,0.0,0.0,0.645},{9.23,3.45,12.34,3.45},{0.8,5.6,6.89,4.67}}
	//a = makeRandMat 300 300
	b:: *MatS
	//b = makeRandMat 300 300
	b =
{{0.0,1.0,1.0,0.0},{0.0,1.0,1.0,12.0},{1.0,134.0,1.0,1.0},{1.0,1.0,1.0,1.0}}

makeRandMat:: !Int !Int -> !*MatS
makeRandMat i j 
	#! mat = { { 0.0 \\ j <- [0..j-1] } \\ j <- [0..i-1] }
	= makeMat 0 i j mat
where
	makeMat:: !Int !Int !Int !*MatS -> !*MatS
	makeMat n i j mat
		| n == (size mat) = mat
		#! el = taken j (genRandReal (43+n) ) 
		#! ell = { (elm+1.0) \\ elm <- el }
	 	= makeMat (n+1) i j { mat & [n] = ell }
	//
	taken:: !Int ![!Real] -> ![!Real]
	taken n [] = []
	taken n [x:r] 
		| n == 0 = []
 		= [x:taken (n-1) r]

TakeOne:: !Mat -> !Real
TakeOne a = a.[1,1]

forwardSubstUnitG ::  !Mat !*Vec  -> !*Vec
forwardSubstUnitG  w b  
	#! sizeVec = size w.[0]
	= forwardSubst_i 0 0 sizeVec w b
where
	forwardSubst_i :: !Int !Int !Int !Mat !*Vec -> !*Vec
	forwardSubst_i  i ii sizeVec wA =: { [i] = wAi } bVec =: { [i] = bVeci
}
		| i == ( sizeVec  ) = bVec
		| ii == 1
			#! summe = dot_sum (dec ii) i wAi  bVec bVeci
			= forwardSubst_i (inc i) (ii) sizeVec wA { bVec & [i] = summe   }
		#! summe = bVec.[i]
		= forwardSubst_i (inc i)  (1) sizeVec wA { bVec & [i] = summe }
		where
				dot_sum:: !Int !Int !Vec !Vec !Real -> !Real
				dot_sum j stop wRow =: { [j] = wRowj } bVec =: { [j] = bVecj } summe 
					| j == ( stop ) = summe
					= dot_sum (inc j) stop wRow bVec  ( summe - (wRowj*bVecj)  )

backwardSubstG ::  !Mat !*Vec  -> !*Vec
backwardSubstG w b 
	#! sizeVec = size w.[0]
	= backwardSubst_i (sizeVec - 1) sizeVec w b
where
	backwardSubst_i:: !Int !Int !Mat !*Vec -> !*Vec
	backwardSubst_i i sizeVec wA =: { [i] = wAi } bVec =: { [i] = bVeci }
		| i == (-1) = bVec
		| i <= (sizeVec - 2)
			#! summe = dot_sum  (inc i) sizeVec wAi bVec bVeci
			#! wAii = wA.[i,i]
			= backwardSubst_i (dec i) sizeVec wA { bVec & [i] = summe / wAii }
		#! summe = bVec.[i]
		= backwardSubst_i (dec i) sizeVec wA { bVec & [i] = summe / wA.[i,i] }
		where
			dot_sum :: !Int !Int !Vec !Vec !Real -> !Real
			dot_sum  j n wRow =: { [j] = wRowj } bVec =: { [j] = bVecj } summe
				| j == (n  ) = summe
				= dot_sum  (inc j) n wRow bVec ( summe - (wRowj*bVecj) )

solveLSPartPivG ::  !*MatS !*MatS -> !*Mat
solveLSPartPivG   a x               
	#! (lu, p) = luPartPiv a
	= solveLSPartPiv_i 0 lu p x
where
	solveLSPartPiv_i :: !Int !*Mat !*VecT !*Mat -> !*Mat
	solveLSPartPiv_i  n lu p x=:{ [n] = x_row }
		| (n == size x) = x
		#! pv = perVec p x_row
		#! y = forwardSubstUnitG lu   pv 
		#! yb = backwardSubstG lu y 
		= solveLSPartPiv_i (inc n) lu p {x & [n] = yb}

//---------------End of program and thank you for stopping
by--------------------