[clean-list] Fastest shuffle with Clean?

Thu, 13 Sep 2001 15:50:15 +0200

Dear John,

> Fabien, Thank you very much for your help.

You're welcome.

> You've answered several questions, and (as is sometimes the case when
questions are answered) raised a few more. :)

Oops...

> A typical simulation results in about 1 billion shuffles of this array,
and the shuffle takes roughly 30% of the simulation computations...

Are you implementing some simulated annealing variant ?

> > Now, I am not sure the compiler does agree with your way of swapping the
> > elements of the array. When you write :
> > 
> > >        #! rnd_elem = ar.[r]
> > >        #! top_elem = ar.[top]
> > 
> > The array ar is no more single-threaded, there is a duplicated reference
to
> > it.

> The compiler did not generate an error message, and it appeared to have
the desired result, i.e. the elements were swapped. But maybe I'm still
trying > to solve this problem procedurally in a functional language?

No you don't, I am mistaken and the compiler is right, but I am baffled, I
will have to review the language report.

> > The following swapping function does however perform the swapping of two
> > array items while keeping the array single-threaded :
> > 
> >   swap a ( i , j )
> >     # ( x , a ) = uselect a i
> >     # ( y , a ) = uselect a j
> >     # a         = update a i y
> >     # a         = update a j x
> >     = a  

> OK, this makes sense. However, the Clean Book never mentions the "uselect"
function, and Array updates are only described with the "{a &" notation. 

Hmm, I am not sure these are documented in any other place than the StdArray
definition module... However you're certainly right about the use of the { a
& ... notation.

> > Admitedlly, uniqueness typing is somehow confusing at the beginning.

> Yes, at least for me.
> It seems like a small door into which I can squeeze some procedural ideas
which I probably shouldn't.
> Then it appears the door is shut. My brain is still trying hard to think
in the FP way. 

Why shouldn't you, some computations are certainly expressed naturally in a
stateless functional manner, but some are not.

> >
> > [snipped some good example code]
> >
> > take care, however, not to define the infiniteRandomNumberList as a
global
> > variable, for it would then be kept in memory while the whole
computation is
> > running, creating an unbearable space leak.

> OK, now I'm confused again. It's a lazy evaluation, right?
> So how does defining it locally differ from defining it globally if it's
only used as needed? 

Well, to see what is going on you need to have a look at the operational
semantics of Clean. That is, computations are carried out by rewriting
rooted graphs whose nodes are, roughly speaking, function calls with
outgoing edges to the function arguments. Even your whole program is
internally represented as a graph being rewritten - or 'reduced' - by the
Clean graph reduction machinery. Reducing a node representing a function
call, or an unevaluated expression, if you prefer, means computing the
explicit representation of the value. When, during the course of the
computation, some nodes have become unreachable from the roots of the
computation, the memory space they occupy is reclaimed by the garbage
collector.

This being said, consider now that you define your infinite random number
list as some global expression, say. When you perform one shuffling of your
array, you will need the explicit N first random numbers stored in the
infinite list. This will be accomplished by Clean rewriting the unevaluated
infinite random list by a list with the first N numbers...and the remaining
unevaluated infinite list of the following random numbers. As the rewritten
list is attached to a global variable, that is, somehow, the root of the
whole program, it will be garbage-collected eventually only when the whole
program terminates. Of course, when you perform again one shuffling, the
list is still explicitly represented and won't be unfolded again. If, on the
other hand, you define your infinite random number list as a local
expression of the shuffling expression, it will eventually be
garbage-collected when the shuffling expression has been completely reduced.

This way you may trade off space for time. The global expression will save
you repeated computations, at the cost of the memory required to store the
explicit representation. The local expression will be recomputed each time
and repeatedly garbage-collected.

Have a look at the draft book 'Functional Programming in Clean', you will
get some better explanations and nice drawings about the graph-rewriting
semantics of Clean program. If you're not faint of heart, you can also
download Pr. Rinus Plasmeijer's Book 'Functional Programming and Parallel
Graph Rewriting' where everything is uncovered and explained at length.

The above are very rough explanations that any Clean guru could go
nitpicking about :)

> I could not figure out how to incorporate a random number generator the
way I had the shuffle written.
> My "randomInt" was a dummy Linear Congruential function that just
performed the same number of operations as a "true" LCM generator
> so I could test the array shuffling speed, but returned the same result
each time since I could not find a way to update the seed. 

Beware that in pure functional programming destructive assignment are
banned, so that 'updating' a state cannot be implemented in the standard
manner. Instead, you have to thread the state you want to update as a
parameter, and in the returned values, of your functions. A common FP
technique illustrating that state-threading style is dubbed "using
accumulator parameters". Ponder the following imperative and functional
versions of the famous factorial function.

  function fac(n:Integer):Integer ;
  var
    i , f : Integer ; // The state that is being updated.
  begin
    // Initial state value.
    f := 1 ;
    i := 0 ;
    // Invariant : f = i!, initially true, preserved by the loop body.
    while ( i < n ) do begin
       // State update.
      i := i + 1 ;
      f := f * i ;
    end ;
    return f ; // Termination of the loop implies i = n, hence, from the
invariant we get f=n!.
  end ;

  fac :: Int -> Int
  fac n
    = accumulator 1 0 // The initial value of the state passed as an
accumulator argument.
  where
    accumulator f i | i < n = accumulator i1 f1 where i1 = i + 1 ; f1 = f *
i1 ; // State update by passing new accumulator value.
    accumulator f _         = f

> The new shuffle works (not completely, I still need to figure out a way to
update the seed passed to the generator),
> but is very slow compared to the first version, actually too slow to use.
> My original version did 1 million shuffles in about 4 seconds on my
computer.
> The new version takes about 24 seconds to do 100,000.

> Here is the new version (genRandReal is from the MersenneTwister library):

> shuffleArray :: *{#Int} !Int !Int -> *{#Int}
> shuffleArray ar top seed
>    = foldl swap ar indexPairList
>    where
>        indexPairList
>            = zip2 [0..top] (map (\ n -> toInt (n * z)) (take top
(genRandReal seed)))
>
>        z = toReal (top + 1)
>
>        swap a (i, j)
>            # ( x , a ) = uselect a i
>            # ( y , a ) = uselect a j
>            # a         = update a i y
>            # a         = update a j x
>            = a  

With the program below I get similar figures to yours on a PentiumIII800
machine, with either commented or uncommented version of swap. Notice that I
thread around the lazy list of random numbers to iterate the shuffleOnce
procedure. 

SEED  :==      1
SIZE  :==    300 // Array size.
SIZE1   = toReal ( SIZE - 1 )
N     :== 100000 // Number of shuffle iterations.

randomIndexList = map ( \ x -> entier ( x * SIZE1 ) ) ( genRandReal SEED )

shuffleOnce array randomIndexList
  = shuffleAt 0 array randomIndexList
where
  shuffleAt i array randomIndexList
    | i < SIZE
    #! randomIndex     = hd randomIndexList
    #! randomIndexList = tl randomIndexList
    = shuffleAt ( i + 1 ) ( swap array i randomIndex ) randomIndexList
  shuffleAt _ array randomIndexList
    = ( array , randomIndexList )
//	  swap a i j
//	    #! ( x , a ) = uselect a i
//	    #! ( y , a ) = uselect a j
//	    #! a         = update  a i y
//	    #! a         = update  a j x
//	    = a  
  swap a i j
    #! x = a.[ i ]
    #! y = a.[ j ]
    = { a & [ i ] = y , [ j ] = x }

shuffleN array
  = shuffle 0 array randomIndexList
where
  shuffle i array randomIndexList
    | i < N
    #! ( array , randomIndexList ) = shuffleOnce array randomIndexList
    = shuffle ( i + 1 ) array randomIndexList
  shuffle _ array _
    = array

Start = shuffleN a
where
  a :: *{#Int}
  a = { i \\ i <- [ 1..SIZE ] }

However, experimenting with :

randomIndexList = repeat 0

I get a computation time of about 2.5s !

That seems to indicate the MersenneTwister generator is eating up much of
the computation time. Now, If ultimate performance is needed, maybe you
would be better off seeing how to extract the logic of the MersenneTwister
so as not to use a lazy list. You may for example explicitly thread around
the state of the random number generator instead of iterating by induction
on a lazy list.

For example, your random number generator without lazy lists could have the
interface below :

:: * Generator // Unique abstract data type.

generatorNew     :: ! Int -> * Generator // Initializing a new unique
generator.
generatorNextInt :: ! * Generator -> * ( Int , * Generator ) // Getting the
next item out of the generator.

Voilà for the idea...

Hope it helps.

Best regards, Fabien TODESCATO