[FieldTrip] calculating behavioural-power correlation -- follow-up questions

Thu Oct 22 00:33:13 CEST 2015

Thanks for clarifying, Xiaoming.

It seems indeed that a systematic offset (in averages) across the two
sample populations biases the randomization distribution. I'd need to sort
this out, but schematically would expect it to look as follow:

normal:
~1000 ~1000 ~1000 ~1000
~1       ~1       ~1       ~1
corr =  ~0

post shuffling:
~1000 ~1       ~1000 ~1000
~1       ~1000 ~1       ~1
corr =  negative, that is an anti-correlation

This implies a bootstrapping approach is more correct for testing for
statistical significance of a correlation (in case of a systematic offset
across conditions).

Yours,
Arjen

2015-10-21 14:28 GMT-07:00 Xiaoming Du <XDu at mprc.umaryland.edu>:

> Hi Martin,
>
> Thanks Martin to make it more clear. However, my understanding is that the
> problem is caused by different means of permutation for paired-sample t
> test and correlation. I am very curious how you solve the problem by
> customizing statfun. Do you mind sending me a copy of that? I will really
> appreciate it.
>
> Hi Arjen,
>
> I think the 'ft_statfun_correlationT' works fine. Using the code in your
> previous email, the correlation result (stat.rho) is same as using MATLAB
> function 'corr' with Spearman method. However,  In your code, the design
> matrix for correlation is defined in the same way as for within-UO design
> (e.g., paired-sample t test). I think Martin's and my concern is that the
> permutation method for correlation and within-UO design (e.g.,
> paired-sample t test) should be different.
>
> For example, there are 10 subjects (patients). Each of them finished a
> task before (A) and after (B) the treatment . Based on the different null
> hypothesis, the permutation methods for paired-sample t test and
> correlation should be different. For paired-sample t test (or within-UO
> design), the null hypothesis is that there is no different before or after
> the treatment. In other words, for each subjects, switch the results before
> and after treatment won't matter. This is exactly how the permutations are
> done when cfg.design is defined as in your code . On the other hand, the
> null hypothesis for correlation is that: there is no relationship between
> the results before and after treatment. Therefore, permutation should be
> done on after-treatment results *among subjects* while keep
> before-treatment results same (or permute before-treatment and keep
> after-treatment). In terms of permutation, correlation and paired-sample t
> test (within-UO design) seems treated equally in fieldtrip.
>
>         Original matrix                                 permutation for
> paired t test                     permutation for correlation
>
>           before (A)      after (B)                            before
> (A)      after (B)                      before (A)
>  after (B)
> subj1        A1               B1                    subj1
> A1               B1                   subj1        A1               B2
> subj2        A2               B2                    subj2
> B2               A2                   subj2        A2               B4
> subj3        A3               B3                    subj3
> B3               A3                   subj3        A3               B3
> subj4        A4               B4                    subj4
> A4               B4                   subj4        A4               B1
> subj5        A5               B5                    subj5
> A5               B5                   subj5        A5               B10
> subj6        A6               B6                    subj6
> B6               A6                   subj6        A6               B8
> subj7        A7               B7                    subj7
> B7               A7                   subj7        A7               B9
> subj8        A8               B8                    subj8
> A8               B8                   subj8        A8               B6
> subj9        A9               B9                    subj9
> B9               A9                   subj9        A9               B5
> subj10      A10             B10                  subj10
> A10             B10                 subj10       A10             B7
>
> As follows, I made minor changes in your previous code to show this
> issue.
> data_brain ranges from 0 to 1;
> data_behav ranges from 1000 to 1001;
> There is no relationship between data_brain and data_behav, because the
> values are randomly assigned to them.
> This is the results I got by using the code below:
>
> stat =
>
>        prob: 9.9900e-04
>     cirange: 0.0020
>        mask: 1
>        stat: 0.2941
>         ref: -10.7655
>         rho: 0.0297
>      dimord: 'chan_time'
>       label: {'1'}
>        time: 1
>         cfg: [1x1 struct]
>
> stat.rho (0.0297) is the same as using 'corr'  with Spearman option in
> matlab; However, stat.prob obtained by using permutation test is so
> small.  The reason (I guess) are 1, two-tail test was used. 2, the rho
> distribution of permutation is biased to negative direction (meaning the
> rho after each permutation was biased to strong negative correlation. e.g.,
> -0.8 etc ), so the real rho fell into the 5% range of rho distribution of
> permutations. In other words, real rho (0.0297) is smaller (or larger) than
> almost all rho from permutation, so that stat.prob is very small
> (0.000999).
>
> Please let me know if this makes sense to you or if I misunderstood
> anything.
>
> Thanks.
>
> Xiaoming
>
>
>
>
>
>
>
>
>
> data_brain = [];
>
> data_behav = [];
>
> for j=1:100
>
> data_brain{j}.avg = rand; % increasing
>
> data_brain{j}.dimord = 'chan_time';
>
> data_brain{j}.time = 1;
>
> data_brain{j}.label = {'1'};
>
>
> data_behav{j} = data_brain{j};
>
> data_behav{j}.avg = rand + 1000; % add scaling difference
>
> end
>
>
> % compute statistics with correlationT
>
> cfg = [];
>
> cfg.method = 'montecarlo';
>
> cfg.statistic = 'ft_statfun_correlationT';
>
> cfg.numrandomization = 1000;
>
> n1 = 100; % n1 is the number of subjects
>
> design = zeros(2, n1 * 2);
>
> design(1,1:n1) = 1;
>
> design(1,(n1 + 1):(n1 * 2)) = 2;
>
> design(2, :) = [1:n1 1:n1];
>
> cfg.design = design;
>
>
> cfg.ivar = 1;
>
> cfg.uvar = 2;
>
> stat = ft_timelockstatistics(cfg, data_brain{:}, data_behav{:});
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> >>> Arjen Stolk <a.stolk8 at gmail.com> 10/21/2015 2:22 PM >>>
> Hi Martin,
>
> Thanks for thinking along. I've briefly tried to replicate/simplify the
> situation depicted by you using the code below. It works as I would expect,
> no matter whether one variable is scaled differently. But perhaps I'm not
> fully capturing the issue, and something still goes awry. This is a
> possibility because ft_statfun_correlationT has only recently been
> implemented for a specific case, and was never really tested within
> different situations (hence it's not well-documented on the wiki). Do you
> think you could use this example code to replicate the situation you are
> experiencing?
>
> Yours,
> Arjen
>
> % simulate simple multiple subjects timelock structures
>
> data_brain = [];
>
> data_behav = [];
>
> for j=1:10
>
> data_brain{j}.avg = j; % increasing
>
> data_brain{j}.dimord = 'chan_time';
>
> data_brain{j}.time = 1;
>
> data_brain{j}.label = {'1'};
>
> data_behav{j} = data_brain{j};
>
> data_behav{j}.avg = data_brain{j}.avg*-1000+50; % add scaling difference
>
> end
>
> % compute statistics with correlationT
>
> cfg = [];
>
> cfg.method = 'montecarlo';
>
> cfg.statistic = 'ft_statfun_correlationT';
>
> cfg.numrandomization = 100;
>
> n1 = 10; % n1 is the number of subjects
>
> design = zeros(2, n1 * 2);
>
> design(1,1:n1) = 1;
>
> design(1,(n1 + 1):(n1 * 2)) = 2;
>
> design(2, :) = [1:n1 1:n1];
>
> cfg.design = design;
>
> cfg.ivar = 1;
>
> cfg.uvar = 2;
>
> stat = ft_timelockstatistics(cfg, data_brain{:}, data_behav{:});
>
> assert(isequal(stat.rho, -1));
>
> 2015-10-21 10:54 GMT-07:00 Krebber, Martin <martin.krebber at charite.de>:
>
>> Hi Xiaoming, hi Arjen,
>>
>> I've been encountering the same problem. I believe Xiaoming is right when
>> he points out that the permutaion step shuffles data across conditions and
>> that this introduces a negative bias in the distribution. I found the same
>> thing when I correlated RT data with TFRs (absolute power). My distribution
>> was shifted strongly to the left and, thus, not a single negative cluster
>> was significant, but every positive one was.
>>
>> Xiaomings explanation made a lot of sense to me when I thought about it
>> graphically: Imagine correlating two data vectors, one (x) ranging between
>> .5 and1, the other (y) between 50 and and 100. When plotting this, one gets
>> a cloud of dots on the upper left corner of the diagram. When you then
>> switch the variable assignment of half of the data points (which is what
>> the permutation step seems to do), these dots will now be be shifted to the
>> lower right corner of the diagram. So no matter what the correlation in the
>> original data, chances are that (given different scaling) after permutaion,
>> you get a negative correlation.
>>
>> I am not 100% sure about this, so please let me know if I made a mistake.
>>
>> What I tried instead of the 'ft_statfun_correlationT' was using a custom
>> made statfun in which I pass the RTs via the design matrix. With this, my
>> results looked much better. I am not sure, but I guess this is because
>> there is no shuffling between the two variables in this case.
>>
>> I would really like to know, what is the right way of doing this using
>> just the FieldTrip functions. Is there a way to permute data within
>> variables? I tried cfg.resampling = 'bootstrap', but this is not a
>> permutation, as far as I know.
>>
>>
>> Thanks!
>> Martin
>>
>>
>> ------------------------------
>> *Von:* fieldtrip-bounces at science.ru.nl [fieldtrip-bounces at science.ru.nl]"
>> im Auftrag von "Arjen Stolk [a.stolk8 at gmail.com]
>> *Gesendet:* Dienstag, 20. Oktober 2015 08:03
>> *An:* FieldTrip discussion list
>> *Betreff:* Re: [FieldTrip] calculating behavioural-power correlation --
>> follow-up questions
>>
>> Hey Xiaoming,
>>
>> It's still pretty hard, for me, to guess on basis of that matlab output
>> what is going on here and what you mean with 'shuffling design matrices',
>> and how that shuffling 'biases the cluster distribution'. As you mention
>> yourself, it could be due to various reasons, and you're open to
>> suggestions and increasing your understanding. I'd therefore suggest to try
>> to funnel the number of potential explanations by simulating what you're
>> doing (using input data for which you know how it should behave), after
>> you've read more about what the design matrix and monte carlo statistics
>> are supposed to do. Perhaps the statistics section at the bottom of this
>> page provides a good starting point:
>> http://www.fieldtriptoolbox.org/walkthrough
>>
>> Hope that helps,
>> Arjen
>>
>> 2015-10-19 15:56 GMT-07:00 Xiaoming Du <XDu at mprc.umaryland.edu>:
>>
>>> For example, our power values ranged from 1 to 3 (after log transform);
>>> my behavioral data ranged from 20 to 90;
>>> by using above mentioned script, there are 14 negative clusters were
>>> reported in variable stat.
>>> stat =
>>> prob: [30x50 double]
>>> posclusters: []
>>> posclusterslabelmat: [30x50 double]
>>> posdistribution: [1x1000 double]
>>> negclusters: [1x14 struct]
>>> negclusterslabelmat: [30x50 double]
>>> negdistribution: [1x1000 double]
>>> cirange: [30x50 double]
>>> mask: [30x50 logical]
>>> stat: [30x50 double]
>>> ref: [30x50 double]
>>> rho: [30x50 double]
>>> dimord: 'chan_freq'
>>> freq: [1x50 double]
>>> label: {30x1 cell}
>>> time: 2.5000
>>> cfg: [1x1 struct]
>>> However, the p values of those clusters (i.e., stat.negclusters.prob)
>>> are all ones. The smallest value in stat.negdistribution is way larger than
>>> the largest negative cluster t-sum. This could be real. However, it is more
>>> likely due to the shuffle between power and behavioral group. For example,
>>> design matrix [1 1 1 1 2 2 2 2; 1 2 3 4 1 2 3 4] was shuffled to [1 2 2 1 2
>>> 2 1 1; 1 2 3 4 1 2 3 4]. After each permutation, for some subjects, their
>>> power data was labeled as behavioral data and vice versa. Because of the
>>> scale difference between power and behavioral data, large negative
>>> correlations were generated by permutation. This further biased the cluster
>>> distribution.
>>> My limited understanding is that, for correlation, each permutation
>>> should fix cfg.ivar and only shuffle half of the cfg.uvar. For example,
>>> permute design matrix [1 1 1 1 2 2 2 2; 1 2 3 4 1 2 3 4] to [1 1 1 1 2 2 2;
>>> 1 2 3 4 4 2 3 1]. THerefore, after permutation, one subject's power data
>>> corresponds to another subject's behavioral data.
>>> I am not good at statistics. It will be really appreciated if you have
>>> any suggestions or comments.
>>> Xiaoming
>>>
>>>
>>> >>> Arjen Stolk <a.stolk8 at gmail.com> 10/19/2015 6:01 PM >>>
>>> Hey Xiaoming,
>>>
>>> Not sure if I understand, but shouldn't the directions of the
>>> correlations be independent of the scaling of the two variables? Looking at
>>> the code of ft_statfun_correlationT it doesn't seem the conversion from
>>> correlation to T value (tstat = rho*(sqrt(max(nunits)-2))/sqrt((1-rho^2)))
>>> would result in a direction change either. Perhaps you could try to first
>>> manually calculate a correlation between signal power and behavioral power,
>>> and see whether anything is behaving unexpectedly?
>>>
>>> Yours,
>>> Arjen
>>>
>>> 2015-10-19 14:25 GMT-07:00 Xiaoming Du <XDu at mprc.umaryland.edu>:
>>>
>>>> Dear FieldTrip users,
>>>> This is Xiaoming from University of Maryland Baltimore. My current
>>>> project requires to calculate behavioral-power correlation across subjects.
>>>> Similar topic was discussed here early this year.
>>>> http://mailman.science.ru.nl/pipermail/fieldtrip/2015-February/008953.html
>>>> According to the suggestions in above mentioned thread, I duplicate my
>>>> power dataset and replace the power values at each time-frequency point
>>>> with behavioral data. Therefore, those two datasets have same structure and
>>>> dimension. I used the following script to test if there are significant
>>>> clusters of correlations.
>>>> cfg = [];
>>>> cfg.parameter = 'powspctrm';
>>>> cfg.method = 'montecarlo';
>>>> cfg.statistic = 'ft_statfun_correlationT';
>>>> ...
>>>> etc
>>>> ...
>>>> design = zeros(2, n1 * 2); % n1 is the number of subjects.
>>>> design(1,1:n1) = 1;
>>>> design(1,(n1 + 1):(n1 * 2)) = 2;
>>>> design(2, :) = [[1:n1 ] [1 : n1]];
>>>> cfg.design = design;
>>>>
>>>> cfg.ivar = 1;
>>>> cfg.uvar = 2;
>>>> stat = ft_freqstatistics(cfg, dataBeh{:}, dataDX1{:});
>>>> However, it seems when each time the design matrix is permuted,
>>>> FieldTrip is using the same method as for 'ft_statfun_depsamplesT', meaning
>>>> cfg.uvar remains the same while cfg.ivar (1 or 2) is randomly assigned to
>>>> each subject in design matrix. Although I confirmed this by uncommenting
>>>> line 313 (i.e., tmpdesign = design(:,resample(i,:))) in
>>>> ft_statistics_montecarlo.m which allows to display the permuted design
>>>> matrix in command line, please correct me if this is not the case.
>>>> In my mind, this kind of permutation will cause trouble when dealing
>>>> with correlation. For example, in my case, the behavioral data and power
>>>> data have different scales. The power data are much larger than behavioral
>>>> data in general. When assigning behavioral data into power group or vice
>>>> versa, it will induce huge negative correlations between power and
>>>> behavioral measurement. Therefore, no negative clusters will survive from
>>>> permutation test.
>>>> Please let me know if I have mis-understanding or if I did anything
>>>> wrong. Any suggestions will be highly appreciated!
>>>> Thanks.
>>>> Xiaoming
>>>>
>>>> _______________________________________________
>>>> fieldtrip mailing list
>>>> fieldtrip at donders.ru.nl
>>>> http://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20151021/8690a104/attachment.htm>