[FieldTrip] Question about cluster-based permutation tests on linear mixed models

Thu Oct 27 13:57:07 CEST 2016

Dear colleagues,

@alik :
1. The approach you propose is a so-called fixed-effects approach, of which the outcome may depend on just a few subjects (provided the number of trials is high). Some neuroscientists consider a fixed-effects approach insufficient to support a scientific claim. E.g., the whole neuroimaging community does so.
2. Your approach is actually a genuine permutation test, in which the LLM-derived t-stats are only used for thresholding (and not for inference).

@david :
1. There is nothing wrong with using FDR correction, if you think the false discovery rate is the quantity that one should control. Others may disagree though, stating the more strict family-wise error rate is the relevant quantity.
2. FDR correction assumes that the sample-specific (sample = a channel-time-frequency triplet) p-values are unbiased. Because the unbiasedness of these p-values depends on auxiliary assumptions, there may be good reasons not to trust them. This is supported by the recent Ekstrom et al paper on the inflated type 1 error rate in neuroimaging studies.

best,
Eric Maris

From: David Groppe <david.m.groppe at gmail.com<mailto:david.m.groppe at gmail.com>>
Subject: Re: [FieldTrip] Question about cluster-based permutation tests on linear mixed models
Date: 26 October 2016 at 19:35:03 GMT+2
To: FieldTrip discussion list <fieldtrip at science.ru.nl<mailto:fieldtrip at science.ru.nl>>, <alik.widge at gmail.com<mailto:alik.widge at gmail.com>>
Reply-To: FieldTrip discussion list <fieldtrip at science.ru.nl<mailto:fieldtrip at science.ru.nl>>

P.S. If you want to explore using FDR control to correct for multiple comparisons, I would not recommend limiting yourself to FieldTrip's FDR correction code (fdr.m). It only implements the Benjamini-Yekutieli FDR control procedure, which is guaranteed to control the FDR at or below the desired level, but tends to be quite overly conservative in practice. The more popular FDR control algorithm by Benjamini & Hochberg is not always
guaranteed to control the FDR at or below the desired level, but it is much less conservative and tends to accurately control FDR in practice.  Here is some code for the
Benjamini & Hochberg algorithm:

https://www.mathworks.com/matlabcentral/fileexchange/27418-fdr-bh

MATLAB's mafdr.m function that is part of the Bioinformatics toolbox also implements the
Benjamini & Hochberg algorithm.

On Tue, Oct 25, 2016 at 3:28 PM, David Groppe <david.m.groppe at gmail.com<mailto:david.m.groppe at gmail.com>> wrote:
I would definitely recommend running some simulations.

It might be simpler to use bootstrap samples rather than permutations to generate your null distribution. Bootstrapping in also asymptotically accurate.
    -David

On Tue, Oct 25, 2016 at 1:29 PM, Alik Widge <alik.widge at gmail.com<mailto:alik.widge at gmail.com>> wrote:
Thanks, that was super interesting! Was not aware of those.

Have been meditating this afternoon on this and related Anderson papers. What's interesting is that he appears to think my suggestion below *would* be asymptotically acceptable -- *if* one specifically permutes the dependent variable (power/ERP observation) rather than permuting each column of the independent variables separately (i.e., if one preserves any correlational structure that exists between the independent variables). That's the Manly (1997) method, and it appears that the only reason it breaks down sometimes is if there's an outlier in the independent variable. This could presumably be a problem in the ecological sciences, for which he's writing, where one can't control things like temperature in a season or numbers of eels that swim past a given sensor. In cognitive neuroscience, where the predictor/independent variables are usually dummy coded properties of the trial, this seems like we might be on firmer ground.

Opinion based on reading and reasoning, of course, and not to be trusted until and unless I or someone else were to back it up by doing some simulated-data experiments...

Alik Widge
alik.widge at gmail.com<mailto:alik.widge at gmail.com>
(206) 866-5435<tel:%28206%29%20866-5435>

On Tue, Oct 25, 2016 at 11:30 AM, David Groppe <david.m.groppe at gmail.com<mailto:david.m.groppe at gmail.com>> wrote:
Hi Elisabeth and Alik,
    Permutation methods applied to multiple regression models are not generally guaranteed to be accurate because testing individual terms in such models (e.g., partial correlation coefficients) requires accurate knowledge of other terms in the model (e.g., the slope coefficients for all the other predictors in the multiple regression). Because such parameters have to be estimated from the data, permutation tests are only ‘‘asymptotically exact’’ for such tests (Anderson, 2001; Good, 2005). Though there are special cases (e.g., a two factor ANOVA with two levels of each factor), where permutation methods do guarantee accuracy.
    In lieu of permutation testing, you might want to try using one of Benjamini and colleagues' false discovery rate (FDR) control algorithms to control for multiple comparisons. In my tests on simulated ERP data (Groppe et al., 2011), FDR correction was nearly as powerful as cluster-based permutation testing for detecting a very broadly distributed effect (e.g., a P300-like effect) and it was far more sensitive than cluster-based testing for an effect with a very limited distribution (e.g., an N170-like effect). FDR correction is also very computationally efficient.
      hope this is helpful,
         -David

Refs:
Anderson, M. J. (2001). Permutation tests for univariate or multivariate analysis of variance and regression. Canadian journal of fisheries and aquatic sciences, 58(3), 626-639.

Good, P. I. (2005). Permutation, Parametric and Bootstrap Tests of Hypotheses: A Practical Guide to Resampling Methods for Testing Hypotheses.

Groppe, D. M., Urbach, T. P., & Kutas, M. (2011). Mass univariate analysis of event‐related brain potentials/fields II: Simulation studies. Psychophysiology, 48(12), 1726-1737.

On Fri, Oct 21, 2016 at 1:38 PM, Elisabeth May <elisabethsusanne.may at gmail.com<mailto:elisabethsusanne.may at gmail.com>> wrote:
Dear Eric and Alik,

thanks a lot for your helpful responses!

I will have a close look at the faqs, Eric, and test the approaches you outlined. I am curious, anyway, as to how different results will be for simple regressions compared to the multilevel results of the linear-mixed models.

Like Alik, I am also curious about other people's opinions on the general question if there are theoretical reasons against a combination of the approaches like Alik suggested. We also thought about this approach but haven't fully tested it yet because of the very long calculation times.

Thanks again and have a nice weekend!
Elisabeth

2016-10-20 12:49 GMT+02:00 Alik Widge <alik.widge at gmail.com<mailto:alik.widge at gmail.com>>:
Eric, I don't think I understand why you would say "I do not see how these models could be combined with permutation-based inference; they are just different statistical frameworks". As you somewhat hint, the (G)LMM is a regression, and the beta coefficient for the independent-variable of interest at each voxel/vertex/sensor x timepoint can be interpreted as "how much does the independent variable explain the brain activity?" In that framework, it seems to me that one could do the following:

for n=1:1000
   1) Permute the condition labels (within subjects) of the individual trials
   2) Re-fit the LMM at each (voxel,timepoint), creating a beta map and corresponding t-map
   3) Threshold and construct cluster mass statistic as usual
end
4) Identify cluster in the original (unpermuted) analysis and report cluster p-value

Now, the main thing that has come up when we've tried to do this is that re-fitting a (voxel x time) GLM 1000 times by the standard iterative maximum-likelihood engines is remarkably slow. In fieldtrip, I can imagine it would require rewriting at least a statfun, maybe other pieces of the code. (We had an idea that, since the betas  likely should vary smoothly over time and space, one could use the output of one GLM as the seed to the next, which would speed up convergence.) So it still does not seem like a good idea, but based on the above, is there actually a *theoretical* reason it wouldn't work?

Alik Widge, MD, PhD
Director, Translational NeuroEngineering Laboratory
Division of Neurotherapeutics, Massachusetts General Hospital
Assistant Professor of Psychiatry, Harvard Medical School
Clinical Fellow, Picower Institute for Learning & Memory (MIT)
awidge at partners.org<mailto:awidge at partners.org>
http://scholar.harvard.edu/awidge/
617-643-2580<tel:617-643-2580>

Alik Widge
alik.widge at gmail.com<mailto:alik.widge at gmail.com>
(206) 866-5435<tel:%28206%29%20866-5435>

On Thu, Oct 20, 2016 at 6:08 AM, Maris, E.G.G. (Eric) <e.maris at donders.ru.nl<mailto:e.maris at donders.ru.nl>> wrote:
Note: this is the second time I post this reply, and the reason is that I forgot to add an appropriate Subject (for findability) to my email (shame on me…(-;)

From: Elisabeth May <elisabethsusanne.may at gmail.com<mailto:elisabethsusanne.may at gmail.com>>
Subject: [FieldTrip] Question about cluster-based permutation tests on linear mixed models
Date: 27 September 2016 at 14:46:55 GMT+2
To: <fieldtrip at science.ru.nl<mailto:fieldtrip at science.ru.nl>>
Reply-To: FieldTrip discussion list <fieldtrip at science.ru.nl<mailto:fieldtrip at science.ru.nl>>

Dear FieldTripers,

I have a question about the potential use of cluster-based permutation tests for results obtained using linear mixed models.

We are working with data from a 10 min EEG experiment on source level with the aim to quantify the relationship of brain activity in different frequency bands with continous perceptual ratings across 20 subjects in different experimental conditions. Thus, we have 10 min time courses of brain activity and ratings for each voxel for different conditions and want to test a) if there are significant relationships in the single conditions and b) if these relationships differ between two conditions. To this end, I have calculated linear mixed models in R using the lme4 toolbox. For both the single condition relationships and the condition contrasts, they result in a single t-value (and a corresponding p-value), which is based on information on both the single subject and the group level (i.e. we perform a multi-level analysis). However, with more than 2000 voxels, we have a lot of t-values and are wondering if there is a way to apply cluster-based tests to correct for multiple comparisons.

The main problem I see is that I only have one multilevel t-value for the effect across all subjects, i.e. I don't have single subjects values, which I could then e.g. randomize between conditions as normally done in cluster-based permutation tests. (Or rather, I would be able to extract single subject values but would then loose the advantage of the multi-level analysis.)

I found an old thread in the mailinglist archive where it was suggested to flip the signs of the t-statistic for cluster-level correction (https://mailman.science.ru.nl/pipermail/fieldtrip/2012-July/005375.html). I understand that, in our case, I would do this randomly for all voxels in each randomization and then build spatial clusters on the resulting (partly flipped) t-values. However, I am not sure if that is a valid approach based on the null hypothesis that there are no significant relations in my single conditions (a) or no significant relationship differences in my condition contrasts (b).

For the condition contrasts, I would be able to permute the condition labels as normally done in cluster-based permutation tests,I think, but would then have to recalculate the linear mixed models for all voxels in every permutation. This would result in a very high computational load.

Does anyone have any experience with this kind of analysis? Would the flipping of t-values be a valid approach (and if yes, is there anything to keep in mind in particular)? Can you think of other ways to combine linear mixed models with a multiple comparison correction on the cluster level?

Hi Elisabeth,

I’m not an expert on linear mixed modelling, at least not with respect to the different ways in which they can be used to deal with correlated observations (typically, time series). However, from a theoretical point of view, I do not see how these models could be combined with permutation-based inference; they are just different statistical frameworks. However, it IS possible to answer your questions ("we have 10 min time courses of brain activity and ratings for each voxel for different conditions and wan to test a) if there are significant relationships in the single conditions and b) if these relationships differ between two conditions.”) within the framework of cluster-based permutation tests. Question b) is the most straightforward because it amounts to a cluster-based permutation test using the depsamplesT statfun applied to the regression coefficients in each of the two conditions. Answering question a) requires that you bin your ratings in a number of categories, calculate the trial-averaged EEG data for each of the categoreies, and test the difference between them using a cluster-based permutation test using the depsamplesregrT statfun. Both of these approaches have been described previously on this discussion list, and for the depsamplesregrT statfun (your question a), it was Vladimir Litvak who used it first (actually, I implemented it for him). The approach for question b) is actually a variant on the general approach for testing interactions using cluster-based permutation tests.

Have a look here:
http://www.fieldtriptoolbox.org/faq/how_can_i_test_for_correlations_between_neuronal_data_and_quantitative_stimulus_and_behavioural_variables
and
http://www.fieldtriptoolbox.org/faq/how_can_i_test_an_interaction_effect_using_cluster-based_permutation_tests

These tutorials provide all the necessary concepts, although they do not answer your question in a recipe-like fashion.

best,
Eric Maris

_______________________________________________
fieldtrip mailing list
fieldtrip at donders.ru.nl<mailto:fieldtrip at donders.ru.nl>
https://mailman.science.ru.nl/mailman/listinfo/fieldtrip

_______________________________________________
fieldtrip mailing list
fieldtrip at donders.ru.nl<mailto:fieldtrip at donders.ru.nl>
https://mailman.science.ru.nl/mailman/listinfo/fieldtrip

_______________________________________________
fieldtrip mailing list
fieldtrip at donders.ru.nl<mailto:fieldtrip at donders.ru.nl>
https://mailman.science.ru.nl/mailman/listinfo/fieldtrip

_______________________________________________
fieldtrip mailing list
fieldtrip at donders.ru.nl<mailto:fieldtrip at donders.ru.nl>
https://mailman.science.ru.nl/mailman/listinfo/fieldtrip

_______________________________________________
fieldtrip mailing list
fieldtrip at donders.ru.nl<mailto:fieldtrip at donders.ru.nl>
https://mailman.science.ru.nl/mailman/listinfo/fieldtrip

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20161027/99783b40/attachment.htm>