NN code, further results

Tue, 16 Mar 1999 12:26:37 +1300 (NZDT)

I've now got C, Clean, and Haskell versions of the NN code all running,
and was interested in speeding up the Haskell version.

I decided to focus on a smaller task instead:  multiplying two
100x100 double-precision matrices.

The tests were done on an 84 MHz SPARCstation 5 using Clean 1.3.1
and GHC 3.02 (yes, I know there is a later version of GHC, but this
was the latest one with *prebuilt binaries* available for the Sparc).

I tried three versions in each language:

 List-based
    The obvious list of lists.

 Ursl-based.
    An "Ursl" is an UnRolled Strict List.  Pronounce it like the
    Irish word "ursul".  Here's the type declaration:
    data (Eval a) => Ursl a
     = U0
     | U1 !a
     | U2 !a !a
     | U3 !a !a !a
     | U4 !a !a !a !a !(Ursl a)
    This was adapted from some studies I did for Prolog and concurrent
    logic programming several years ago.  There's no Haskell equivalent
    of [!a], you have to make your _own_ strict data structures.  It
    also gives you, if you have a compiler that is seriously competent
    at inlining, or if you twist your code to macros, the effect of
    unrolling.  It saves space and time compared with a list.  The
    only down-side is that Ursls don't lend themselves to incremental
    extension; append is costly and so is cons.

    The worst thing about this is that when I declared the mapping
    functions for Ursls as macros, the way the Clean team say I should,
    the compiler crashes.  So they don't inline because they've got
    macros, but if I use macros, the compiler crashes.  Not a win.

 Array-based.
    The Haskell version used standard Haskell arrays.  This made it
    possible to say that something is a two-dimensional array.  It
    also made the code reasonably pretty.  The Clean code uses a
    vector of vectors, and it was a nightmare to get right:  if the
    runtime system tells me it has detected a cycle in a spine, it
    would be really great to have some idea *where* or in *what*.
    I had to rewrite that part of the code twice before it would go.

RESULTS
    Compiler       List    Ursl   Array
    GHC 3.02       5 sec   3 sec  25 sec
    Clean 1.3.1    5 sec   4 sec   0.8 sec
    C                              0.2 sec

We see that
 - Clean and GHC produce similar results for lists.  That's quite
   encouraging.
 - the effect of not inlining and not letting me use macros either
   really does show up.
 - Although this code makes NO use of uniqueness or array update,
   Clean's arrays really really shine.  WELL DONE CLEAN TEAM!

Multiplying two 100x100 matrices costs a million adds and a million
multiplies.  The SPARCstation 5 has a superscalar CPU that can start
a new FPU operation every cycle.  Interestingly, the Clean/C ratio
for this problem is the same as the Clean/C ratio for the original
NN problem:  4.