NN code: some progress

Fri, 5 Mar 1999 13:47:54 +1300 (NZDT)

John van G's advice that Clean doesn't inline things gave me an
important clue about making my NN code run faster.

I had stuff like this:

    Number :== !Real
    Vector :== ![Number]

    vred2 :: !(Number Number Number -> Number) Number Vector Vector -> Number
    vred2 f a [] [] = a
    vred2 f a [x:xs] [y:ys] = vred2 f (f a x y) xs ys

    vdot :: Vector Vector -> Number
    vdot xs ys = vred2 (\a x y -> a+x*y) xs ys

With that kind of code, I was getting 10560 seconds or about that.
Getting the uniqueness annotations right on my random generator state
and sticking everything in one file (so all the strictness information
was visible) gave me about a 7% performance boost.

Ripping out the higher-order functions and writing

    vdot :: Vector Vector -> Number
    vdot xs ys = loop xs ys 0
      where loop [] [] s = s
            loop [x:xs] [y:ys] s = loop xs ys (s+x*y)

plus adding a couple of (quite minor) uniqueness annotations
got the time down to 2248 seconds.

Now, one of the oldest optimisations there is in the functional programming
community is expanding mapping functions in-line.  That goes back to some
of the earliest Lisp compilers (maybe not Lisp 1.5, but shortly after).
It's reasonably easy to characterise mapping functions, or at least the
relevant set of functions:

    - function arguments are not put into data structures or returned
    - no body recursion (tail recursion, yes, in the Prolog sense)
    - function is small

It is rather startling to find a performance boost of BETTER THAN A FACTOR
OF FOUR by avoiding higher-order functions.

Is that enough to make me use macros?  No.  If I have to write the code
in a somewhat contorted way, e.g.

    vred2 :: !(Number Number Number -> Number) Number Vector Vector -> Number
    vred2 f a xs ys =
      loop a xs ys
      where loop a [] [] = a
            loop a [x:xs] [y:ys] = loop (f a x y) xs ys

then I'll do it.  If I have in addition to write a pragma

    :: pragma :: f inline
    :: pragma :: f macro

or something weird, I'll do that too.  But if I have to use a special
macro syntax in order to get good performance, I might as well use C.

#define VRED2(nm,fn) \
  double nm(int n, double const *x, double const *y, double a) { \
    while (n-- != 0) a = fn(a,*x++,*y++); \
    return a; \
  }
#define DOTMAC(a,x,y) ((a)+(x)*(y))
VRED2(vdot, DOTMAC)

The C code is even the same size as the Clean code.

I'm especially reluctant to use macros since the manual does not describe
the semantics of macros clearly and explicitly warns that error messages
may be cryptic if macros are used.

I've been telling students that one of the reasons functional languages
are good is that it's easier for the compiler to make inlining decisions
and to do inlining than in languages with all sorts of side effects and
order dependencies.  In the future, do I have to tell them "provided the
compiler writer is Jeff Siskind"?  I really want to *honestly* tell
these students that functional programming is elegant *and* practical.
If that means switching to Mercury instead of Clean, I'll do that too.

I do appreciate that the Clean team has limited time, limited manpower,
and limited money, and that there are all _sorts_ of things they need
to do yesterday.  But my C compilers (UNIX, MacOS) make inlining decisions
without any help ('inline' has been added to C9x at about the time that
it stopped being useful!), the Eiffel compiler I have makes inlining
decisions without any help, and the Ada compiler I use just needs a
pragma (and can probably manage without it).  If it only made a few percent
difference, who'd need it.  But a factor of better than four?