[clean-list] Transform to CPS
    Fergus Henderson 
    fjh@cs.mu.OZ.AU
    Thu, 6 Feb 2003 23:09:59 +1100
    
    
  
On 06-Feb-2003, Simon Peyton-Jones <simonpj@microsoft.com> wrote:
> [Fergus Henderson:]
> | The Melbourne Mercury compiler has had support for deforestation since
> | Mercury 0.8, released on 18 November 1998.  It supports a quite general
> | form of deforestation that is capable of deforesting intermediate
> | data structures of user-defined types such as trees.
> 
> I didn't know that -- I'm well impressed.  One the hardest aspects, I
> found, was to guarantee that the transformation would never make things
> worse.  For example, transforming a function that consumes a data
> structure directly, to use a fold instead, makes it less efficient.
> Ditto on the 'production' side.  So there has to be a way to make sure
> that when fusion does not happen, the program is no less efficient than
> before.  Is this the case in Mercury?
We don't use short-cut style (e.g. fold/build) deforestation.
We don't convert code to use `fold', for example.
Instead, we explicitly look for code which contains two procedure calls,
one of which produces a data structure, and the other of which switches on
the data structure produced, and replaces this with a new procedure whose
body combines the two.  This can slightly reduce efficiency by increasing
the code size, and also by increasing the number of arguments (and
hence increasing register pressure).  However, the optimization is only
applied if the new procedure's body can be simplified.  It is these
simplifications which improve efficiency.
In particular, we try to push the switch that originally came from the
consumer procedure into each branch of the producer procedure's body.
This is more code duplication, which again only reduces efficiency.
But if in some branch of the original producer procedure's body, the
intermediate data structure is bound to a term with a known top-level
functor, then both the construction and the switch can be eliminated
in that branch, instead replacing the switch with the matching case.
That's where we get a big win.  The compiler also identifies call pairs
in the new procedure which call the original producer and consumer
procedures, and replaces them with recursive calls to the new procedure.
Since we are apply the transformation to paired consumers/producers,
rather than to individual consumers or individual producers, it's easy
enough to ensure that we only apply it in cases when it leads to
fusion or other simplifications.
However, even when fusion does occur, it's not easy to be sure that
it is going to improve performance overall.  At this point we resort
to some heuristics.  We have a simple cost model for the cost of each
operation, and we compare the cost and size of the original code with
the cost and size of the transformed code.  We only use the transformed
code if there is an improvement in cost which our heuristics judge is
sufficient to justify the increase in code size, if any.
(If we had more information, e.g. profiling feedback, it would in
theory be possible to use this information in the cost modelling
to get a more accurate heuristic.)
So, because the heuristic is not perfect, when the transformation is applied,
it could potentially make the program a little less efficient.  However, any
such efficiency loss would only be a small constant factor, I think,
and in most cases it is a win, sometimes a major one.
>  Is the MSc thesis you refer to available online?
Yes, sorry, I forgot to paste in the URL:
<http://www.cs.mu.oz.au/research/mercury/information/papers.html#stayl_hons>.
Note that it is a BSc(Hons) thesis, not an MSc thesis.
-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.