[clean-list] FFT and Clean

Fri, 24 Aug 2001 10:47:54 +0200

[This message is posted to comp.lang.functional and also to the Clean
mailing-list]

Enclosed please find an implementation of the one dimensional FFT in
Clean. The version is based on the
Danielson-Lanczos Lemma scheme. You can find a C version in the
"Numerical Recipes".

Therefore a few comments:

- C is unprecedented, completely insane
- You should be aware that the Numerical Recipes C code is using
one-offset (as Fortran) and not zero-offset. This is
also accounted for when you pass the vector to the fft subroutine, then
you have to pass it with pointer beeing
decremented by 1. Completely mad.

- Clean is zero-offset but thanks to God there are no foolish pointer
arithmetic involved. C is fuck.

- If you miss "cosine" in the core loop you should consider the
following relation: 2*sin(x)*sin(x) = 1- cos(2x)

- Numerical Recipes code is using the sign for the forward and backward
transformation in reverse order. But maybe
nobody on earth know what the right sign is. With the sign you only
change the sign between real and imaginary part.

- Execution times:

On my old Mac (100 MHz 603e processor and native Apple MPW C/C++
compiler) the C version takes 11sec for a 2^18 complex
problem and the equivalent Clean version takes 15 sec. This means the C
version is only a factor about 1.4 faster. The Clean version shows
absolutely no garbage-collection thanks to destructive update. This is
quite impressive. Even
when compared to my last experience with the FFT and Clean. I used a FFT
code from a Clean tutorial (Brazil) and then even the tuned version took
100sec on my old Mac.

Sure there exists faster FFT implementations. IDL-demo-version take on
my Mac for the above problem size about 6sec
(but IDL is very, very expensive [USD 1000.-] and therefore they have
implemented a very, very tuned FFT code). And I
also have a small utillity which calculates a 2^18 FFT in 3sec, but this
utillity is based on 3500lines (no joke!) of highly tuned C code for the
PowerPC.

My Yorick takes also 15sec for the above problem. And a Forth (Mops for
the Mac) version takes also 15sec.

I didn't test it with smaller values (e.g 1024). I do not trust the
timer when it comes to very small execution times.

But I plan to implement the FFT also for the 2 dimensional case. Maybe I
will implement also the Cooley-Tukey code (this is a more natural one)
and that the real and imaginary part are placed in two distinct arrays.

And now let us relax and bother: "Why is this damn Java popular?". Clean
is in the league with a high end Lisp compiler. Why Java? I think after
drinking a bottle of Cognac I will get the enlightenment. Assured, I
will report what the truth is then.

Enclosed the Clean version and the C version. Forget this fucking C
version. You can copy and paste the Clean version (in order to use
pretty-printing you should have installed Thorsten Zoerner's Clas
library. You get it from the Clean homepage).

If you find the formatting strange please let me know (it is often a
problem to transfer formatted code).

Friede sei mit Euch,
S. Gonzi

-------Here is the Clean code------------------------
module fft
import StdEnv,SampleVec

//*********************************************************************
// Implementation of the one dimensional FFT Danielson-Lanczos Lemma
// scheme
// You can find a C or Fortran77/90 code in the book "Numerical Recipes"
// or elsewhere
// Siegfried Gonzi 2001
// This code is freeware under the following restrictions:
// - You must not sold it
// - You must not use it in order to conduct animal-experiments 
// (e.g. in medical-tests)
//
// The code is not scaled for N, if you like you can divide it yourself
//
// The ARRAY must be a power of 2. This is not checked for!
// You pass the array in form: FFT (BitRev ARRAY) isign
// Where you assign: N:== number of elements (= 2*dimension)
//
// e.g. a complex 16 element dimension array has 32 members:
// 0,1i,2,3i,...(N-1)
//
// Tested and developed on a Macintosh and Clean version 1.3.3
//
************************************************************************

FFT::   *{#Real} !Real  -> *{#Real}
FFT marray isign = LOOP1 2  marray
where
 LOOP1:: !Int  *{#Real} -> *{#Real}
 LOOP1 mmax  marray
  | mmax>=N = marray
  = LOOP1 (mmax*2)  (LOOP2 1 marray 1.0 0.0)
 where
  istep = mmax*2
  theta = isign*(6.2831853071959/toReal(mmax))
  wtemp = sin(0.5*theta)
  wpr = (-2.0)*wtemp*wtemp
  wpi = sin(theta)
  LOOP2:: !Int *{#Real} !Real !Real -> *{#Real}
  LOOP2  m marray  wr wi
   | m>=(mmax) = marray
   #! wtemp = wr
   = LOOP2  (m+2) (LOOP3 m marray)
              (wr+wtemp*wpr-wi*wpi)(wi+wi*wpr+wtemp*wpi)
  where
   LOOP3:: !Int *{#Real} -> *{#Real}
   LOOP3 i marray
    | i>=N = marray
     #! xi = marray.[i-1]
     #! yi = marray.[i]
     #! xj = marray.[i-1+mmax]
     #! yj = marray.[i+mmax]
     #! tempr = wr*xj - wi*yj
     #! tempi = wr*yj + wi*xj
     = LOOP3 (i+istep) { marray &
                        [i-1+mmax] = (xi - tempr),
                        [i+mmax] = (yi - tempi),
                        [i-1] = (xi + tempr),
                        [i] = (yi + tempi)  }

BitRev::  *{#Real} -> *{#Real}
BitRev marray = BitShuffling 0 0 marray
where
 BitShuffling:: !Int !Int *{#Real} -> *{#Real}
 BitShuffling i j marray
  | i>=(N-1) = marray
  | j>i
   #! xj = marray.[j]
   #! xi =marray.[i]
   #! yi = marray.[i+1]
   #! yj = marray.[j+1]
   = BitShuffling (i+2) m1{marray & [i]=xj,[j]=xi,[j+1]=yi,[i+1=yj}
  | otherwise = BitShuffling (i+2) m1 marray
 where
   m1 = (LOOP2 (N/2) j )
   LOOP2:: !Int !Int -> !Int
   LOOP2 m j
   | m <2 || j<m  = (j+m)
   = LOOP2 (m/2) (j-m)

//*************************** End of FFT code *********************

//---TESTING OF THE FFT
//---YOU MAY HAVE TO DE-COMMENT IN ORDER TO PERFORM THE TESTS
//---The index is running from: 0..(N-1), the first element is 
//---the real part,
//---the second element is the accompanying imaginary part and so forth

//--- N is used as global variable in the computations!

//---Problem size: 2^4 this means N:== 32
//---16 + 16 = 32 (real + imaginary)
//---isign = 1.0 foreward transformation
MyArray:: *{#Real}
MyArray = { toReal(x) \\ x <- [0..(N-1)] }
N:==32
Start:: String
Start = prettyRowVector (FFT (BitRev  MyArray) (1.0))

//---Yorick calculates the following values for the above problem:
//
// [240+256i,-96.4374+64.4374i,-54.6274+22.6274i,-39.9457+7.94569i,
//  -32+0i, -26.6909-5.30914i,-22.6274-9.37258i,-19.1826-12.8174i,
//  -16-16i, -12.8174-19.1826i,-9.37258-22.6274i,-5.30914-26.6909i,
// 0-32i,7.94569-39.9457i, 22.6274-54.6274i,64.4374-96.4374i]

//---Problem size: 2^10 this means N:== 2048
//--- isign = -1.0 backward transformation
//MyArray:: *{#Real}
//MyArray = { toReal(x) \\ x <- [0..(N-1)] }
//N:==2048
//Start::!Real
//Start = array.[0]
//where
// array:: *{#Real}
// array = FFT  (BitRev  MyArray) (-1.0)

//---Problem size: 2^16 this means N:== 131072
//--- isign = 1.0 forward transformation
//MyArray:: *{#Real}
//MyArray = { toReal(x) \\ x <- [0..(N-1)] }
//N:==131072
//Start::!Real
//Start = array.[0]
//where
// array:: *{#Real}
// array = FFT  (BitRev  MyArray) (1.0)

//---Problem size: 2^18 this means N:== 524288
//--- isign = 1.0 forward transformation
//MyArray:: *{#Real}
//MyArray = { toReal(x) \\ x <- [0..(N-1)] }
//N:==524288
//Start::!Real
//Start = array.[0]
//where
// array:: *{#Real}
// array = FFT  (BitRev  MyArray) (1.0)

-------Here ends the Clean code and we are going closer to the hell:

------------------------Devil it is your turn:

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h
#define SWAP(a,b) tempr=(a);(a)=(b);(b)=tempr

/*Allocate a float vector with subscript range v[n1...nh]*/
/*Referenz: "Numerical Recipes in C" */
float *fvector(int nh)
{
 float *pVec;

 pVec = (float *)malloc((size_t) (nh*sizeof(float)));
 if (!pVec) printf("allocation failure in float-vector()");

return (pVec);
}

int main()
{
 int i,j;
 float *pvec, fi;
 pvec = fvector(524288);

 fi = 0.0;

 for (i=0; i<524288; i++)
 {
  pvec[i] = i;
  //fi = fi + 1.0;
 }
 printf("Ende Allokierung");

  printf("%f\n",pvec[100]);
 fft_1d(pvec-1, 262144, 1);

 printf("%f\n",pvec[100]);

return (0);
}

/*Die Daten data[1..2*nn] werden zu den Werten der diskreten Fourier
Transform*/
/*umgewandelt; isign bezeichnet das Vorzeichen: normale (isign=1)
Transformation*/
/*oder inverse (isign=-1)*/
/*Die geraden Werte sind die realen Eingabe/Ausgabewerte; die ungeraden
Werte sind*/
/*die imaginaeren Eingabe/Ausgabewerte; daher hat ein komplexer
Datensatz mit*/
/*z.B. N=16 ein nn Wert von 32*/
/*Die Ausgabewerte sind folgendermassen angeordnet: f(0),
f(1),..f(Nq),..-f(N-1),-f(N)*/
/*Die ersten Werte gehoeren zu niedrigen Frequenzen; die Nyquistfrequenz
ist die*/
/*hoechste Frequenz; danach faellt die Frequenz wieder ab und nimmt ein
negatives*/
/*Vorzeichen an*/
/*Der Eingabedatensatz muss eine Potenz von 2 sein*/

/*Der nachfolgende Code ist eine Uebertragung des Codes von den
Numerical Recipies*/

void fft_1d(float *data, unsigned long nn, int isign)
{
 unsigned long n, mmax, m, j, istep, i;
 double wtemp, wr, wpr, wpi, wi, theta;  /*komplexe Koeffizienten*/
 float tempr, tempi;                    /*W^k Fk*/

 n = nn << 1;
 j = 1;
 for (i = 1; i < n; i += 2)   /*Bit Umkehr*/
 {
  if (j > i)
  {
   SWAP(data[j],data[i]);
   SWAP(data[j+1],data[i+1]);
  }
  m = n >> 1;
  while (m >= 2 && j > m)
  {
   j -= m;
   m >>= 1;
  }
  j += m;
 }

 /*Hier beginnt der Danielson-Lanczos Code*/
 mmax = 2;
 while (n > mmax)   /*nlog(n) mal durchlaufen*/
 {
  istep = mmax << 1;
  theta = isign*(6.2831853071959/mmax);
  wtemp = sin(0.5*theta);
  wpr = -2.0*wtemp*wtemp;     /*2 sin(x)*sin(x) = 1- cos(2x)*/
  wpi = sin(theta);
  wr = 1.0;
  wi = 0.0;
  for (m = 1; m < mmax; m += 2)
  {
   for (i = m; i <= n; i += istep)
   {
    j = i + mmax;   /*gerade, ungerade*/
    tempr = wr*data[j] - wi*data[j+1];
    tempi = wr*data[j+1] + wi*data[j];
    data[j] = data[i] - tempr;
    data[j+1] = data[i+1] - tempi;
    data[i] += tempr;
    data[i+1] += tempi;
   }
   wr = wr + (wtemp=wr)*wpr - wi*wpi;
   wi = wi + wi*wpr + wtemp*wpi;
  }
  mmax = istep;
 }
}
----------End of fucking C code------------------------------