[clean-list] Fastest shuffle with Clean?

CaptnJamesKirk@aol.com CaptnJamesKirk@aol.com
Wed, 12 Sep 2001 12:52:49 EDT


Fabien,

Thank you very much for your help. You've answered several questions, and (as is sometimes the case when questions are answered) raised a few more. :)

> > Is this the most efficient (read: fastest) way to shuffle an array in
> > Clean?
> 
> I can't make out what you mean, you should maybe state clearly what are your
> requirements about the suffling you're talking about.

I'm sorry. The array size is about 300 elements (but can be as few as 50 for certain simulation runs). A typical simulation results in about 1 billion shuffles of this array, and the shuffle takes roughly 30% of the simulation computations. The easiest way to speed up the simulation is to speed up the shuffle. The array is the central data element of the simulation, so the integrity of the simulation is completely dependent on the integrity of the shuffle, and it is very important that it be done properly. This also relates to my concerns about the random number generator and why I would like to use the MersenneTwister.

> Now, I am not sure the compiler does agree with your way of swapping the
> elements of the array. When you write :
> 
> >        #! rnd_elem = ar.[r]
> >        #! top_elem = ar.[top]
> 
> The array ar is no more single-threaded, there is a duplicated reference to
> it.

The compiler did not generate an error message, and it appeared to have the desired result, i.e. the elements were swapped. But maybe I'm still trying to solve this problem procedurally in a functional language? 

> The following swapping function does however perform the swapping of two
> array items while keeping the array single-threaded :
> 
>   swap a ( i , j )
>     # ( x , a ) = uselect a i
>     # ( y , a ) = uselect a j
>     # a         = update a i y
>     # a         = update a j x
>     = a  

OK, this makes sense. However, the Clean Book never mentions the "uselect" function, and Array updates are only described with the "{a &" notation. Well, maybe it's mentioned in the "case studies" of Part II, which I haven't gotten to yet. :( 

> Admitedlly, uniqueness typing is somehow confusing at the beginning.

Yes, at least for me. It seems like a small door into which I can squeeze some procedural ideas which I probably shouldn't. Then it appears the door is shut. My brain is still trying hard to think in the FP way. 

>
> [snipped some good example code]
>
> take care, however, not to define the infiniteRandomNumberList as a global
> variable, for it would then be kept in memory while the whole computation is
> running, creating an unbearable space leak.

OK, now I'm confused again. It's a lazy evaluation, right? So how does defining it locally differ from defining it globally if it's only used as needed? Or maybe I'm confusing "list variable" with the function that creates the list. I rewrote the shuffle using your suggestions (thank you very much, btw!) and tried using the global genRandReal function from the MersenneTwister library and everything seemed to work fine (heap use remained constant regardless of how many times the shuffle was called).

Also, each time shuffleArray is called, won't infinteRandomNumberList be regenerated from the beginning each time? The generator is called with a seed value, so each time shuffleArray is called a new seed value will have to be passed to the generator or else it will return the same list each time. This seems to defeat it's ability to generate an infinte list from one seed.

I could not figure out how to incorporate a random number generator the way I had the shuffle written. My "randomInt" was a dummy Linear Congruential function that just performed the same number of operations as a "true" LCM generator so I could test the array shuffling speed, but returned the same result each time since I could not find a way to update the seed. 

The new shuffle works (not completely, I still need to figure out a way to update the seed passed to the generator), but is very slow compared to the first version, actually too slow to use. My original version did 1 million shuffles in about 4 seconds on my computer. The new version takes about 24 seconds to do 100,000.

Here is the new version (genRandReal is from the MersenneTwister library):

shuffleArray :: *{#Int} !Int !Int -> *{#Int}
shuffleArray ar top seed
    = foldl swap ar indexPairList
    where
        indexPairList
            = zip2 [0..top] (map (\ n -> toInt (n * z)) (take top (genRandReal seed)))

        z = toReal (top + 1)

        swap a (i, j)
            # (x, a ) = uselect a i
            # (y, a ) = uselect a j
            # a       = update a i y
            # a       = update a j x
            = a  

--
Regards,
John