bug in statistics_montecarlo?

Thu Jul 5 13:57:51 CEST 2007

Dear Robert, Eric and other fellows of the Fieldtrip team,

I'm just trying out your excellent software, in specific your extremely
useful and well designed statistics functions. Many thanks for this
great and generous gift to the neuroscience community!

I would like to point you to an inconsistency I seem to have found  in
the implementation of the statistics_montecarlo function. When I
compared the probabilities computed with statistics_analytic and
statistics_montecarlo for the depsamplesT test, the probabilities
determined via the randomization test turned out to be too low by a
factor of about 1/2, see code and Matlab output below.

-------------------------

N = 50;
M = 0.25;
alpha = 0.05;
tail = 0;     % 'both'

% construct a normally distributed difference vector
% with unit variance and mean M
d = randn(1,N);
d = d ./ sqrt(var(d,1,2));
d = d -mean(d) + M;

% compute a  paired t-test
[H,P,CI,stat] = ttest(d,0,alpha,tail)

H =

     0

P =

    0.0864

CI =

   -0.0371    0.5371

stat =

    tstat: 1.7500
       df: 49
       sd: 1.0102

% generate two sample vectors with their difference given by d
x =randn(1,N);
y = x + d;

% generate the design matrix for testing with fieldtrip
design = [
  1:N              1:N                                % subject number
  repmat(1,[1 N])  repmat(2,[1 N])     % condition number
];

cfg = [];
cfg.statistic        = 'depsamplesT';
cfg.method           = 'analytic';
cfg.correctm         = 'no';
cfg.alpha            = 0.05;
cfg.tail             = 0;
cfg.uvar             = 1;
cfg.ivar             = 2;

 [stat_analytic, cfg] = statistics_analytic(cfg,[x y],design)

stat_analytic =

       stat: -1.7500
         df: 49
    critval: [-2.0096 2.0096]
       prob: 0.0864
       mask: 0

cfg.numrandomization = 5000;
[stat_montecarlo, cfg] = statistics_montecarlo(cfg,[x y],design)

using "statfun_depsamplesT" for the single-sample statistics
constructing randomized design
total number of measurements     = 100
total number of variables        = 2
number of independent variables  = 1
number of unit variables         = 1
number of control variables      = 0
number of within-block variables = 0
repeated measurement in variable 1 over 50 levels
number of repeated measurements in each level is 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
estimated time per randomization is 0 seconds
not performing a correction for multiple comparisons

stat_montecarlo =

    prob: 0.0394
    mask: 1
    stat: -1.7500

--------------------

Maybe I'm missing something. But I think you should replace

stat.prob = min(prb_neg, prb_pos);
 by
stat.prob = 2 *min(prb_neg, prb_pos);

in line 344 of statistics_montecarlo(),  for the two-tailed test.

Maybe it would be even better to compare the absolute values of the
observed and the randomized statistics, i.e. doing something like this

for i=1:Nrand,
....
if tail == 0,
prb_pos = prb_pos + (abs(statobs)<abs(statrand));
end
...
end
if tail == 0,
stat.prob = prb_pos./Nrand;
end

What do you think?

Many thanks in advance for your comments.
Best regards,
--Maren

--
Max Planck Institute for Human Cognitive and Brain Sciences
Dipl.-Phys. Maren Grigutsch     mailto:grigu at cbs.mpg.de
Stephanstr.1a                   http://www.cbs.mpg.de
04103 Leipzig, Germany          phone/fax: +49 341 9940-136/-260

----------------------------------
The aim of this list is to facilitate the discussion between users of the FieldTrip  toolbox, to share experiences and to discuss new ideas for MEG and EEG analysis. See also http://listserv.surfnet.nl/archives/fieldtrip.html and http://www.ru.nl/fcdonders/fieldtrip.