[clean-list] Data intensive programming and Clean

Richard O'Keefe ok at cs.otago.ac.nz
Fri Dec 16 02:34:04 MET 2011


On 15/12/2011, at 10:57 PM, Groenouwe, C. wrote:

> I'm trying to find out:
> - How, in general, functional programming languages perform on data-intensive tasks (manipulation of large datasets, e.g.: doing some statistical analysis on a table with 100.000 instances and 30 columns) (regarding speed and memory usage)

Consider that Haskell implementations range from interpreters (Hugs is basically dead, but
ghci is very much alive) to byte code compilers to native code compilers to native code
compilers doing mind-boggling optimisations including user-specified fusion laws.

Consider also that the Haskell 98 and Haskell 2010 *language* reports don't give you much
you'd want to use for heavy-duty number crunching, but the libraries in the Haskell Platform
include tolerably good arrays of unboxed numbers of several kinds, bindings to BLAS, &c.
That's not to mention support for concurrency...

So for one functional programming language, the answer could be anywhere between
"really really bad, don't go there" to "surprisingly good, actually" depending on
the compiler and libraries.

If you want to do your own benchmarks, expect to find GHC, UHC, and JHC giving you
not just different numbers but different spectrums.

When it comes to memory usage, there is the amount of memory that must be *held* in data
structures.  Languages like Haskell and Clean and SML with support for arrays of unboxed
native floats shouldn't hold much more space than C would.  There is also the amount of
memory that is *turned over*, which is going to depend on compiler issues (SML/NJ heap
allocates stack frames; I believe MLton does not).

> - Which functional language performs best?

That question has no answer.
It's a question about *compilers*, not languages.

Clean has long been famed as having a good compiler.

> Additional question: which functional languages exploits (hardware) parallelism running on a multi core CPU best? (Or more CPU's)?

Erlang does that easily.  (But it does not have unboxed numbers or mutable arrays.)
Haskell does it fairly easily.  The results can be surprising, though.
Clean was originally called >Concurrent< Clean for good reason;
those concurrency features were removed from the documentation.
This is another area which is subject to change.  I'm expecting to see
Clean come back with something great in that area.

One language that *doesn't* is O'CAML, which is a pity, because it has an
excellent compiler and used to be the performance champion amongst ML-like systems.
(Mind you, I could never bring myself to put up with the language; sometimes
performance isn't worth the pain.)

F# should be as good at concurrency as any other .NET language.
There may be something about that at the Flying Frog web site.




More information about the clean-list mailing list