Execution speed

John van Groningen johnvg@cs.kun.nl
Wed, 6 Jan 1999 11:24:00 +0100


Cliff Nelson wrote:

>I am too new at Clean programming to guess how to speed up the following program. Start = test n
>will work correctly when n is a prime number if 2 is a quadratic non-residue of n (none of the
>second powers of the integers 1 to n-1 (mod n) is congruent to 2). divide x x is supposed to give
>all ones except the last coordinate.
>
>I am trying to convert the stuff I did in Mathematica to Clean.
>
>See:
>
>http://forum.swarthmore.edu/epigone/geometry-research/brydilyum
>
>Any pointers about how to speed the program up would be appreciated.

99 % of the execution time is spent in the function 'innerab' of times:

>::B :== {#Int}
>
>times:: B B -> B
>times a b =  {(inn r) \\ r <- [0..size a -1]}
>where
> inn r = innerab 0 1
> where
>  innerab acc j 
>  | j > (size a) =  acc
>  | otherwise = innerab  (acc bitxor ((a.[(size a+r-j)mod(size a)])bitand(b.[j-1]))) (j+1)

You can speed up this function by:

- making b strict by adding a '!' in the type of times:

  times:: B !B -> B

- compute size a once:

  # size_a = size a
  | j > size_a =  acc
  | otherwise = innerab  (acc bitxor (a.[(size_a+r-j) mod size_a] bitand b.[j-1])) (j+1)

- not using 'mod' inside the loop. The modulo and divide instructions of processors are slow. Divide usually takes about 20 (PPC 604 and G3,amd K6) or 40 (Pentium,PPC 601 and 603) clock cycles.

times :: !B !B -> B
times  a b =  {inn r \\ r <- [0..size a -1]}
where
 size_a = size a
 inn r = innerab 0 1 (r mod size_a)
 where
  innerab :: !Int !Int !Int -> Int;
  innerab acc j r_mod_s
    | j > size_a
       = acc
       # i = if (r_mod_s-j>=0) (r_mod_s-j) (size_a+r_mod_s-j);
       = innerab  (acc bitxor (a.[i] bitand b.[j-1])) (j+1) r_mod_s

The first 2 optimisations make the program about 20 % faster. After also removing 'mod', the program is 2.6 times as fast on a 266 MHz PowerPC G3.

John van Groningen