[FieldTrip] calculating behavioural-power correlation -- follow-up questions

Xiaoming Du XDu at mprc.umaryland.edu
Wed Oct 21 23:28:19 CEST 2015


Hi Martin,
 
Thanks Martin to make it more clear. However, my understanding is that the problem is caused by different means of permutation for paired-sample t test and correlation. I am very curious how you solve the problem by customizing statfun. Do you mind sending me a copy of that? I will really appreciate it. 
 
Hi Arjen,
 
I think the 'ft_statfun_correlationT' works fine. Using the code in your previous email, the correlation result (stat.rho) is same as using MATLAB function 'corr' with Spearman method. However,  In your code, the design matrix for correlation is defined in the same way as for within-UO design (e.g., paired-sample t test). I think Martin's and my concern is that the permutation method for correlation and within-UO design (e.g., paired-sample t test) should be different. 
 
For example, there are 10 subjects (patients). Each of them finished a task before (A) and after (B) the treatment . Based on the different null hypothesis, the permutation methods for paired-sample t test and correlation should be different. For paired-sample t test (or within-UO design), the null hypothesis is that there is no different before or after the treatment. In other words, for each subjects, switch the results before and after treatment won't matter. This is exactly how the permutations are done when cfg.design is defined as in your code . On the other hand, the null hypothesis for correlation is that: there is no relationship between the results before and after treatment. Therefore, permutation should be done on after-treatment results among subjects while keep before-treatment results same (or permute before-treatment and keep after-treatment). In terms of permutation, correlation and paired-sample t test (within-UO design) seems treated equally in fieldtrip. 
 
	    Original matrix 							    permutation for paired t test 				    permutation for correlation
 
		  before (A)      after (B)							before (A)	  after (B)  					before (A)	  after (B)  							 
subj1	    A1 			  B1   				 subj1	    A1 			  B1   			    subj1 	   A1  			 B2
subj2	    A2 			  B2   				 subj2	    B2 			  A2   			    subj2 	   A2  			 B4
subj3	    A3 			  B3   				 subj3	    B3 			  A3   			    subj3 	   A3  			 B3
subj4	    A4 			  B4   				 subj4	    A4 			  B4   			    subj4 	   A4  			 B1
subj5	    A5 			  B5   				 subj5	    A5 			  B5   			    subj5 	   A5  			 B10
subj6	    A6 			  B6   				 subj6   	 B6			   A6  				 subj6   	 A6			   B8
subj7	    A7 			  B7   				 subj7	    B7 			  A7   			    subj7 	   A7  			 B9
subj8	    A8 			  B8   				 subj8	    A8 			  B8   			    subj8 	   A8  			 B6
subj9	    A9 			  B9   				 subj9	    B9 			  A9   			    subj9 	   A9  			 B5
subj10	  A10  		   B10  		 	   subj10 	  A10   		  B10   			  subj10  	 A10			 B7
 
As follows, I made minor changes in your previous code to show this issue.  
data_brain ranges from 0 to 1;
data_behav ranges from 1000 to 1001;
There is no relationship between data_brain and data_behav, because the values are randomly assigned to them.
This is the results I got by using the code below: 
 
stat = 
 
	   prob: 9.9900e-04
    cirange: 0.0020
	   mask: 1
	   stat: 0.2941
	    ref: -10.7655
	    rho: 0.0297
	 dimord: 'chan_time'
	  label: {'1'}
	   time: 1
	    cfg: [1x1 struct]
 
stat.rho (0.0297) is the same as using 'corr'  with Spearman option in matlab; However, stat.prob obtained by using permutation test is so small.  The reason (I guess) are 1, two-tail test was used. 2, the rho distribution of permutation is biased to negative direction (meaning the rho after each permutation was biased to strong negative correlation. e.g., -0.8 etc ), so the real rho fell into the 5% range of rho distribution of permutations. In other words, real rho (0.0297) is smaller (or larger) than almost all rho from permutation, so that stat.prob is very small (0.000999). 
 
Please let me know if this makes sense to you or if I misunderstood anything. 
 
Thanks.
 
Xiaoming
 
 
 
 
 
 
 
 
 
data_brain = [];
 
data_behav = [];
 
for j=1:100
 
data_brain{j}.avg = rand; % increasing 
 
data_brain{j}.dimord = 'chan_time';
 
data_brain{j}.time = 1;
 
data_brain{j}.label = {'1'};
 

data_behav{j} = data_brain{j};
 
data_behav{j}.avg = rand + 1000; % add scaling difference
 
end
 

% compute statistics with correlationT
 
cfg = [];
 
cfg.method = 'montecarlo';
 
cfg.statistic = 'ft_statfun_correlationT';
 
cfg.numrandomization = 1000;
 
n1 = 100; % n1 is the number of subjects
 
design = zeros(2, n1 * 2);
 
design(1,1:n1) = 1;
 
design(1,(n1 + 1):(n1 * 2)) = 2;
 
design(2, :) = [1:n1 1:n1];
 
cfg.design = design;
 

cfg.ivar = 1;
 
cfg.uvar = 2;
 
stat = ft_timelockstatistics(cfg, data_brain{:}, data_behav{:});
 
 
 
 
 
 
 
 
 
 
 
 


>>> Arjen Stolk <a.stolk8 at gmail.com> 10/21/2015 2:22 PM >>>
Hi Martin,

Thanks for thinking along. I've briefly tried to replicate/simplify the situation depicted by you using the code below. It works as I would expect, no matter whether one variable is scaled differently. But perhaps I'm not fully capturing the issue, and something still goes awry. This is a possibility because ft_statfun_correlationT has only recently been implemented for a specific case, and was never really tested within different situations (hence it's not well-documented on the wiki). Do you think you could use this example code to replicate the situation you are experiencing?

Yours,
Arjen 


% simulate simple multiple subjects timelock structures
data_brain = [];
data_behav = [];
for j=1:10
data_brain{j}.avg = j; % increasing 
data_brain{j}.dimord = 'chan_time';
data_brain{j}.time = 1;
data_brain{j}.label = {'1'};

data_behav{j} = data_brain{j};
data_behav{j}.avg = data_brain{j}.avg*-1000+50; % add scaling difference
end

% compute statistics with correlationT
cfg = [];
cfg.method = 'montecarlo';
cfg.statistic = 'ft_statfun_correlationT';
cfg.numrandomization = 100;

n1 = 10; % n1 is the number of subjects
design = zeros(2, n1 * 2);
design(1,1:n1) = 1;
design(1,(n1 + 1):(n1 * 2)) = 2;
design(2, :) = [1:n1 1:n1];
cfg.design = design;

cfg.ivar = 1;
cfg.uvar = 2;
stat = ft_timelockstatistics(cfg, data_brain{:}, data_behav{:});

assert(isequal(stat.rho, -1));

2015-10-21 10:54 GMT-07:00 Krebber, Martin <martin.krebber at charite.de>:


Hi Xiaoming, hi Arjen,

I've been encountering the same problem. I believe Xiaoming is right when he points out that the permutaion step shuffles data across conditions and that this introduces a negative bias in the distribution. I found the same thing when I correlated RT data with TFRs (absolute power). My distribution was shifted strongly to the left and, thus, not a single negative cluster was significant, but every positive one was. 

Xiaomings explanation made a lot of sense to me when I thought about it graphically: Imagine correlating two data vectors, one (x) ranging between .5 and1, the other (y) between 50 and and 100. When plotting this, one gets a cloud of dots on the upper left corner of the diagram. When you then switch the variable assignment of half of the data points (which is what the permutation step seems to do), these dots will now be be shifted to the lower right corner of the diagram. So no matter what the correlation in the original data, chances are that (given different scaling) after permutaion, you get a negative correlation.

I am not 100% sure about this, so please let me know if I made a mistake.

What I tried instead of the 'ft_statfun_correlationT' was using a custom made statfun in which I pass the RTs via the design matrix. With this, my results looked much better. I am not sure, but I guess this is because there is no shuffling between the two variables in this case.

I would really like to know, what is the right way of doing this using just the FieldTrip functions. Is there a way to permute data within variables? I tried cfg.resampling = 'bootstrap', but this is not a permutation, as far as I know.


Thanks!
Martin


Von: fieldtrip-bounces at science.ru.nl [fieldtrip-bounces at science.ru.nl]" im Auftrag von "Arjen Stolk [a.stolk8 at gmail.com]
Gesendet: Dienstag, 20. Oktober 2015 08:03
An: FieldTrip discussion list
Betreff: Re: [FieldTrip] calculating behavioural-power correlation -- follow-up questions

Hey Xiaoming, 

It's still pretty hard, for me, to guess on basis of that matlab output what is going on here and what you mean with 'shuffling design matrices', and how that shuffling 'biases the cluster distribution'. As you mention yourself, it could be due to various reasons, and you're open to suggestions and increasing your understanding. I'd therefore suggest to try to funnel the number of potential explanations by simulating what you're doing (using input data for which you know how it should behave), after you've read more about what the design matrix and monte carlo statistics are supposed to do. Perhaps the statistics section at the bottom of this page provides a good starting point: http://www.fieldtriptoolbox.org/walkthrough

Hope that helps,
Arjen 

2015-10-19 15:56 GMT-07:00 Xiaoming Du <XDu at mprc.umaryland.edu>:


For example, our power values ranged from 1 to 3 (after log transform); my behavioral data ranged from 20 to 90;
by using above mentioned script, there are 14 negative clusters were reported in variable stat.
stat = 
prob: [30x50 double]
posclusters: []
posclusterslabelmat: [30x50 double]
posdistribution: [1x1000 double]
negclusters: [1x14 struct]
negclusterslabelmat: [30x50 double]
negdistribution: [1x1000 double]
cirange: [30x50 double]
mask: [30x50 logical]
stat: [30x50 double]
ref: [30x50 double]
rho: [30x50 double]
dimord: 'chan_freq'
freq: [1x50 double]
label: {30x1 cell}
time: 2.5000
cfg: [1x1 struct]
However, the p values of those clusters (i.e., stat.negclusters.prob) are all ones. The smallest value in stat.negdistribution is way larger than the largest negative cluster t-sum. This could be real. However, it is more likely due to the shuffle between power and behavioral group. For example, design matrix [1 1 1 1 2 2 2 2; 1 2 3 4 1 2 3 4] was shuffled to [1 2 2 1 2 2 1 1; 1 2 3 4 1 2 3 4]. After each permutation, for some subjects, their power data was labeled as behavioral data and vice versa. Because of the scale difference between power and behavioral data, large negative correlations were generated by permutation. This further biased the cluster distribution. 
My limited understanding is that, for correlation, each permutation should fix cfg.ivar and only shuffle half of the cfg.uvar. For example, permute design matrix [1 1 1 1 2 2 2 2; 1 2 3 4 1 2 3 4] to [1 1 1 1 2 2 2; 1 2 3 4 4 2 3 1]. THerefore, after permutation, one subject's power data corresponds to another subject's behavioral data.
I am not good at statistics. It will be really appreciated if you have any suggestions or comments.
Xiaoming


>>> Arjen Stolk <a.stolk8 at gmail.com> 10/19/2015 6:01 PM >>>
Hey Xiaoming, 

Not sure if I understand, but shouldn't the directions of the correlations be independent of the scaling of the two variables? Looking at the code of ft_statfun_correlationT it doesn't seem the conversion from correlation to T value (tstat = rho*(sqrt(max(nunits)-2))/sqrt((1-rho^2))) would result in a direction change either. Perhaps you could try to first manually calculate a correlation between signal power and behavioral power, and see whether anything is behaving unexpectedly? 

Yours,
Arjen

2015-10-19 14:25 GMT-07:00 Xiaoming Du <XDu at mprc.umaryland.edu>:


Dear FieldTrip users,
This is Xiaoming from University of Maryland Baltimore. My current project requires to calculate behavioral-power correlation across subjects. Similar topic was discussed here early this year. http://mailman.science.ru.nl/pipermail/fieldtrip/2015-February/008953.html
According to the suggestions in above mentioned thread, I duplicate my power dataset and replace the power values at each time-frequency point with behavioral data. Therefore, those two datasets have same structure and dimension. I used the following script to test if there are significant clusters of correlations.
cfg = [];
cfg.parameter = 'powspctrm';
cfg.method = 'montecarlo';
cfg.statistic = 'ft_statfun_correlationT';
...
etc
...
design = zeros(2, n1 * 2); % n1 is the number of subjects.
design(1,1:n1) = 1;
design(1,(n1 + 1):(n1 * 2)) = 2;
design(2, :) = [[1:n1 ] [1 : n1]];
cfg.design = design;

cfg.ivar = 1;
cfg.uvar = 2;
stat = ft_freqstatistics(cfg, dataBeh{:}, dataDX1{:});
However, it seems when each time the design matrix is permuted, FieldTrip is using the same method as for 'ft_statfun_depsamplesT', meaning cfg.uvar remains the same while cfg.ivar (1 or 2) is randomly assigned to each subject in design matrix. Although I confirmed this by uncommenting line 313 (i.e., tmpdesign = design(:,resample(i,:))) in ft_statistics_montecarlo.m which allows to display the permuted design matrix in command line, please correct me if this is not the case. 
In my mind, this kind of permutation will cause trouble when dealing with correlation. For example, in my case, the behavioral data and power data have different scales. The power data are much larger than behavioral data in general. When assigning behavioral data into power group or vice versa, it will induce huge negative correlations between power and behavioral measurement. Therefore, no negative clusters will survive from permutation test. 
Please let me know if I have mis-understanding or if I did anything wrong. Any suggestions will be highly appreciated!
Thanks.


Xiaoming 

_______________________________________________
fieldtrip mailing list
fieldtrip at donders.ru.nl
http://mailman.science.ru.nl/mailman/listinfo/fieldtrip






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20151021/78e927c3/attachment-0002.html>


More information about the fieldtrip mailing list