NN code, further results
Richard A. O'Keefe
ok@atlas.otago.ac.nz
Tue, 16 Mar 1999 12:26:37 +1300 (NZDT)
I've now got C, Clean, and Haskell versions of the NN code all running,
and was interested in speeding up the Haskell version.
I decided to focus on a smaller task instead: multiplying two
100x100 double-precision matrices.
The tests were done on an 84 MHz SPARCstation 5 using Clean 1.3.1
and GHC 3.02 (yes, I know there is a later version of GHC, but this
was the latest one with *prebuilt binaries* available for the Sparc).
I tried three versions in each language:
List-based
The obvious list of lists.
Ursl-based.
An "Ursl" is an UnRolled Strict List. Pronounce it like the
Irish word "ursul". Here's the type declaration:
data (Eval a) => Ursl a
= U0
| U1 !a
| U2 !a !a
| U3 !a !a !a
| U4 !a !a !a !a !(Ursl a)
This was adapted from some studies I did for Prolog and concurrent
logic programming several years ago. There's no Haskell equivalent
of [!a], you have to make your _own_ strict data structures. It
also gives you, if you have a compiler that is seriously competent
at inlining, or if you twist your code to macros, the effect of
unrolling. It saves space and time compared with a list. The
only down-side is that Ursls don't lend themselves to incremental
extension; append is costly and so is cons.
The worst thing about this is that when I declared the mapping
functions for Ursls as macros, the way the Clean team say I should,
the compiler crashes. So they don't inline because they've got
macros, but if I use macros, the compiler crashes. Not a win.
Array-based.
The Haskell version used standard Haskell arrays. This made it
possible to say that something is a two-dimensional array. It
also made the code reasonably pretty. The Clean code uses a
vector of vectors, and it was a nightmare to get right: if the
runtime system tells me it has detected a cycle in a spine, it
would be really great to have some idea *where* or in *what*.
I had to rewrite that part of the code twice before it would go.
RESULTS
Compiler List Ursl Array
GHC 3.02 5 sec 3 sec 25 sec
Clean 1.3.1 5 sec 4 sec 0.8 sec
C 0.2 sec
We see that
- Clean and GHC produce similar results for lists. That's quite
encouraging.
- the effect of not inlining and not letting me use macros either
really does show up.
- Although this code makes NO use of uniqueness or array update,
Clean's arrays really really shine. WELL DONE CLEAN TEAM!
Multiplying two 100x100 matrices costs a million adds and a million
multiplies. The SPARCstation 5 has a superscalar CPU that can start
a new FPU operation every cycle. Interestingly, the Clean/C ratio
for this problem is the same as the Clean/C ratio for the original
NN problem: 4.