[FieldTrip] Question about cluster-based permutation tests on linear mixed models

Sat Oct 29 20:20:05 CEST 2016

@Eric
1. Note the FDR control actually provides "weak" control of the family-wise
error rate (FWER). This is exactly the same degree of FWER control provided
by cluster-based permutation tests. If you want strong control of FWER you
will need to do something like Bonferroni-Holm or max-statistic (i.e.,
non-cluster) based permutation tests.
2. It's true that FDR assumes that the individual test p-values are
accurate. You can derive these p-values though via non-parametric
techniques like non-cluster-based permutation tests, thus the underlying
statistical assumptions will be the same as if you had done a cluster-based
permutation test. Moreover, if you have good reason to make parametric
assumptions, FDR will let you exploit them.

@Alik
1. I simulated data in that 2011 paper by randomly flipping the polarity of
real EEG data to generate 0 mean, realistic ERP noise. The noise was
subject-specific, but the simulated ERP effects were exactly the same for
all simulated participants. So it couldn't be used to address the question
you're after.

On Thu, Oct 27, 2016 at 7:57 AM, Maris, E.G.G. (Eric) <e.maris at donders.ru.nl
> wrote:

> Dear colleagues,
>
> @alik :
> 1. The approach you propose is a so-called fixed-effects approach, of
> which the outcome may depend on just a few subjects (provided the number of
> trials is high). Some neuroscientists consider a fixed-effects approach
> insufficient to support a scientific claim. E.g., the whole neuroimaging
> community does so.
> 2. Your approach is actually a genuine permutation test, in which the
> LLM-derived t-stats are only used for thresholding (and not for inference).
>
> @david :
> 1. There is nothing wrong with using FDR correction, if you think the
> false discovery rate is the quantity that one should control. Others may
> disagree though, stating the more strict family-wise error rate is the
> relevant quantity.
> 2. FDR correction assumes that the sample-specific (sample = a
> channel-time-frequency triplet) p-values are unbiased. Because the
> unbiasedness of these p-values depends on auxiliary assumptions, there may
> be good reasons not to trust them. This is supported by the recent Ekstrom
> et al paper on the inflated type 1 error rate in neuroimaging studies.
>
> best,
> Eric Maris
>
>
>
>
>
>
>
> *From: *David Groppe <david.m.groppe at gmail.com>
> *Subject: **Re: [FieldTrip] Question about cluster-based permutation
> tests on linear mixed models*
> *Date: *26 October 2016 at 19:35:03 GMT+2
> *To: *FieldTrip discussion list <fieldtrip at science.ru.nl>, <
> alik.widge at gmail.com>
> *Reply-To: *FieldTrip discussion list <fieldtrip at science.ru.nl>
>
>
> P.S. If you want to explore using FDR control to correct for multiple
> comparisons, I would not recommend limiting yourself to FieldTrip's FDR
> correction code (fdr.m). It only implements the Benjamini-Yekutieli FDR
> control procedure, which is guaranteed to control the FDR at or below the
> desired level, but tends to be quite overly conservative in practice. The
> more popular FDR control algorithm by Benjamini & Hochberg is not always
> guaranteed to control the FDR at or below the desired level, but it is
> much less conservative and tends to accurately control FDR in practice.
> Here is some code for the
> Benjamini & Hochberg algorithm:
>
> https://www.mathworks.com/matlabcentral/fileexchange/27418-fdr-bh
>
> MATLAB's mafdr.m function that is part of the Bioinformatics toolbox also
> implements the
> Benjamini & Hochberg algorithm.
>
>
>
> On Tue, Oct 25, 2016 at 3:28 PM, David Groppe <david.m.groppe at gmail.com>
> wrote:
>
>> I would definitely recommend running some simulations.
>>
>> It might be simpler to use bootstrap samples rather than permutations to
>> generate your null distribution. Bootstrapping in also asymptotically
>> accurate.
>>     -David
>>
>>
>>
>> On Tue, Oct 25, 2016 at 1:29 PM, Alik Widge <alik.widge at gmail.com> wrote:
>>
>>> Thanks, that was super interesting! Was not aware of those.
>>>
>>> Have been meditating this afternoon on this and related Anderson papers.
>>> What's interesting is that he appears to think my suggestion below *would*
>>> be asymptotically acceptable -- *if* one specifically permutes the
>>> dependent variable (power/ERP observation) rather than permuting each
>>> column of the independent variables separately (i.e., if one preserves any
>>> correlational structure that exists between the independent variables).
>>> That's the Manly (1997) method, and it appears that the only reason it
>>> breaks down sometimes is if there's an outlier in the independent variable.
>>> This could presumably be a problem in the ecological sciences, for which
>>> he's writing, where one can't control things like temperature in a season
>>> or numbers of eels that swim past a given sensor. In cognitive
>>> neuroscience, where the predictor/independent variables are usually dummy
>>> coded properties of the trial, this seems like we might be on firmer ground.
>>>
>>>
>>> Opinion based on reading and reasoning, of course, and not to be trusted
>>> until and unless I or someone else were to back it up by doing some
>>> simulated-data experiments...
>>>
>>>
>>> Alik Widge
>>> alik.widge at gmail.com
>>> (206) 866-5435
>>>
>>>
>>> On Tue, Oct 25, 2016 at 11:30 AM, David Groppe <david.m.groppe at gmail.com
>>> > wrote:
>>>
>>>> Hi Elisabeth and Alik,
>>>>     Permutation methods applied to multiple regression models are not
>>>> generally guaranteed to be accurate because testing individual terms in
>>>> such models (e.g., partial correlation coefficients) requires accurate
>>>> knowledge of other terms in the model (e.g., the slope coefficients for all
>>>> the other predictors in the multiple regression). Because such parameters
>>>> have to be estimated from the data, permutation tests are only
>>>> ‘‘asymptotically exact’’ for such tests (Anderson, 2001; Good, 2005).
>>>> Though there are special cases (e.g., a two factor ANOVA with two levels of
>>>> each factor), where permutation methods do guarantee accuracy.
>>>>     In lieu of permutation testing, you might want to try using one of
>>>> Benjamini and colleagues' false discovery rate (FDR) control algorithms to
>>>> control for multiple comparisons. In my tests on simulated ERP data (Groppe
>>>> et al., 2011), FDR correction was nearly as powerful as cluster-based
>>>> permutation testing for detecting a very broadly distributed effect (e.g.,
>>>> a P300-like effect) and it was far more sensitive than cluster-based
>>>> testing for an effect with a very limited distribution (e.g., an N170-like
>>>> effect). FDR correction is also very computationally efficient.
>>>>       hope this is helpful,
>>>>          -David
>>>>
>>>>
>>>> Refs:
>>>> Anderson, M. J. (2001). Permutation tests for univariate or
>>>> multivariate analysis of variance and regression. *Canadian journal of
>>>> fisheries and aquatic sciences*, *58*(3), 626-639.
>>>>
>>>> Good, P. I. (2005). Permutation, Parametric and Bootstrap Tests of
>>>> Hypotheses: A Practical Guide to Resampling Methods for Testing Hypotheses.
>>>>
>>>> Groppe, D. M., Urbach, T. P., & Kutas, M. (2011). Mass univariate
>>>> analysis of event‐related brain potentials/fields II: Simulation studies.
>>>>  *Psychophysiology*, *48*(12), 1726-1737.
>>>>
>>>>
>>>> On Fri, Oct 21, 2016 at 1:38 PM, Elisabeth May <
>>>> elisabethsusanne.may at gmail.com> wrote:
>>>>
>>>>> Dear Eric and Alik,
>>>>>
>>>>> thanks a lot for your helpful responses!
>>>>>
>>>>> I will have a close look at the faqs, Eric, and test the approaches
>>>>> you outlined. I am curious, anyway, as to how different results will be for
>>>>> simple regressions compared to the multilevel results of the linear-mixed
>>>>> models.
>>>>>
>>>>> Like Alik, I am also curious about other people's opinions on the
>>>>> general question if there are theoretical reasons against a combination of
>>>>> the approaches like Alik suggested. We also thought about this approach but
>>>>> haven't fully tested it yet because of the very long calculation times.
>>>>>
>>>>> Thanks again and have a nice weekend!
>>>>> Elisabeth
>>>>>
>>>>> 2016-10-20 12:49 GMT+02:00 Alik Widge <alik.widge at gmail.com>:
>>>>>
>>>>>> Eric, I don't think I understand why you would say "I do not see how
>>>>>> these models could be combined with permutation-based inference; they are
>>>>>> just different statistical frameworks". As you somewhat hint, the (G)LMM is
>>>>>> a regression, and the beta coefficient for the independent-variable of
>>>>>> interest at each voxel/vertex/sensor x timepoint can be interpreted as "how
>>>>>> much does the independent variable explain the brain activity?" In that
>>>>>> framework, it seems to me that one could do the following:
>>>>>>
>>>>>> for n=1:1000
>>>>>>    1) Permute the condition labels (within subjects) of the
>>>>>> individual trials
>>>>>>    2) Re-fit the LMM at each (voxel,timepoint), creating a beta map
>>>>>> and corresponding t-map
>>>>>>    3) Threshold and construct cluster mass statistic as usual
>>>>>> end
>>>>>> 4) Identify cluster in the original (unpermuted) analysis and report
>>>>>> cluster p-value
>>>>>>
>>>>>>
>>>>>> Now, the main thing that has come up when we've tried to do this is
>>>>>> that re-fitting a (voxel x time) GLM 1000 times by the standard iterative
>>>>>> maximum-likelihood engines is remarkably slow. In fieldtrip, I can imagine
>>>>>> it would require rewriting at least a statfun, maybe other pieces of the
>>>>>> code. (We had an idea that, since the betas  likely should vary smoothly
>>>>>> over time and space, one could use the output of one GLM as the seed to the
>>>>>> next, which would speed up convergence.) So it still does not seem like a
>>>>>> good idea, but based on the above, is there actually a *theoretical* reason
>>>>>> it wouldn't work?
>>>>>>
>>>>>>
>>>>>> Alik Widge, MD, PhD
>>>>>> Director, Translational NeuroEngineering Laboratory
>>>>>> Division of Neurotherapeutics, Massachusetts General Hospital
>>>>>> Assistant Professor of Psychiatry, Harvard Medical School
>>>>>> Clinical Fellow, Picower Institute for Learning & Memory (MIT)
>>>>>> awidge at partners.org
>>>>>> http://scholar.harvard.edu/awidge/
>>>>>> 617-643-2580
>>>>>>
>>>>>> Alik Widge
>>>>>> alik.widge at gmail.com
>>>>>> (206) 866-5435
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 20, 2016 at 6:08 AM, Maris, E.G.G. (Eric) <
>>>>>> e.maris at donders.ru.nl> wrote:
>>>>>>
>>>>>>> Note: this is the second time I post this reply, and the reason is
>>>>>>> that I forgot to add an appropriate Subject (for findability) to my email
>>>>>>> (shame on me…(-;)
>>>>>>>
>>>>>>> *From: *Elisabeth May <elisabethsusanne.may at gmail.com>
>>>>>>> *Subject: **[FieldTrip] Question about cluster-based permutation
>>>>>>> tests on linear mixed models*
>>>>>>> *Date: *27 September 2016 at 14:46:55 GMT+2
>>>>>>> *To: *<fieldtrip at science.ru.nl>
>>>>>>> *Reply-To: *FieldTrip discussion list <fieldtrip at science.ru.nl>
>>>>>>>
>>>>>>>
>>>>>>> Dear FieldTripers,
>>>>>>>
>>>>>>> I have a question about the potential use of cluster-based
>>>>>>> permutation tests for results obtained using linear mixed models.
>>>>>>>
>>>>>>> We are working with data from a 10 min EEG experiment on source
>>>>>>> level with the aim to quantify the relationship of brain activity in
>>>>>>> different frequency bands with continous perceptual ratings across 20
>>>>>>> subjects in different experimental conditions. Thus, we have 10 min time
>>>>>>> courses of brain activity and ratings for each voxel for different
>>>>>>> conditions and want to test a) if there are significant relationships in
>>>>>>> the single conditions and b) if these relationships differ between two
>>>>>>> conditions. To this end, I have calculated linear mixed models in R using
>>>>>>> the lme4 toolbox. For both the single condition relationships and the
>>>>>>> condition contrasts, they result in a single t-value (and a corresponding
>>>>>>> p-value), which is based on information on both the single subject and the
>>>>>>> group level (i.e. we perform a multi-level analysis). However, with more
>>>>>>> than 2000 voxels, we have a lot of t-values and are wondering if there is a
>>>>>>> way to apply cluster-based tests to correct for multiple comparisons.
>>>>>>>
>>>>>>> The main problem I see is that I only have one multilevel t-value
>>>>>>> for the effect across all subjects, i.e. I don't have single subjects
>>>>>>> values, which I could then e.g. randomize between conditions as normally
>>>>>>> done in cluster-based permutation tests. (Or rather, I would be able to
>>>>>>> extract single subject values but would then loose the advantage of the
>>>>>>> multi-level analysis.)
>>>>>>>
>>>>>>> I found an old thread in the mailinglist archive where it was
>>>>>>> suggested to flip the signs of the t-statistic for cluster-level correction
>>>>>>> (https://mailman.science.ru.nl/pipermail/fieldtrip/2012-July
>>>>>>> /005375.html). I understand that, in our case, I would do this
>>>>>>> randomly for all voxels in each randomization and then build spatial
>>>>>>> clusters on the resulting (partly flipped) t-values. However, I am not sure
>>>>>>> if that is a valid approach based on the null hypothesis that there are no
>>>>>>> significant relations in my single conditions (a) or no significant
>>>>>>> relationship differences in my condition contrasts (b).
>>>>>>>
>>>>>>> For the condition contrasts, I would be able to permute the
>>>>>>> condition labels as normally done in cluster-based permutation tests,I
>>>>>>> think, but would then have to recalculate the linear mixed models for all
>>>>>>> voxels in every permutation. This would result in a very high computational
>>>>>>> load.
>>>>>>>
>>>>>>> Does anyone have any experience with this kind of analysis? Would
>>>>>>> the flipping of t-values be a valid approach (and if yes, is there anything
>>>>>>> to keep in mind in particular)? Can you think of other ways to combine
>>>>>>> linear mixed models with a multiple comparison correction on the cluster
>>>>>>> level?
>>>>>>>
>>>>>>>
>>>>>>> Hi Elisabeth,
>>>>>>>
>>>>>>> I’m not an expert on linear mixed modelling, at least not with
>>>>>>> respect to the different ways in which they can be used to deal with
>>>>>>> correlated observations (typically, time series). However, from a
>>>>>>> theoretical point of view, I do not see how these models could be combined
>>>>>>> with permutation-based inference; they are just different statistical
>>>>>>> frameworks. However, it IS possible to answer your questions ("we
>>>>>>> have 10 min time courses of brain activity and ratings for each voxel for
>>>>>>> different conditions and wan to test a) if there are significant
>>>>>>> relationships in the single conditions and b) if these relationships differ
>>>>>>> between two conditions.”) within the framework of cluster-based permutation
>>>>>>> tests. Question b) is the most straightforward because it amounts to a
>>>>>>> cluster-based permutation test using the depsamplesT statfun applied to the
>>>>>>> regression coefficients in each of the two conditions. Answering question
>>>>>>> a) requires that you bin your ratings in a number of categories, calculate
>>>>>>> the trial-averaged EEG data for each of the categoreies, and test the
>>>>>>> difference between them using a cluster-based permutation test using the
>>>>>>> depsamplesregrT statfun. Both of these approaches have been described
>>>>>>> previously on this discussion list, and for the depsamplesregrT statfun
>>>>>>> (your question a), it was Vladimir Litvak who used it first (actually, I
>>>>>>> implemented it for him). The approach for question b) is actually a variant
>>>>>>> on the general approach for testing interactions using cluster-based
>>>>>>> permutation tests.
>>>>>>>
>>>>>>> Have a look here:
>>>>>>> http://www.fieldtriptoolbox.org/faq/how_can_i_test_for_corre
>>>>>>> lations_between_neuronal_data_and_quantitative_stimulus_and_
>>>>>>> behavioural_variables
>>>>>>> and
>>>>>>> http://www.fieldtriptoolbox.org/faq/how_can_i_test_an_intera
>>>>>>> ction_effect_using_cluster-based_permutation_tests
>>>>>>>
>>>>>>> These tutorials provide all the necessary concepts, although they do
>>>>>>> not answer your question in a recipe-like fashion.
>>>>>>>
>>>>>>> best,
>>>>>>> Eric Maris
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> fieldtrip mailing list
>>>>>>> fieldtrip at donders.ru.nl
>>>>>>> https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> fieldtrip mailing list
>>>>>> fieldtrip at donders.ru.nl
>>>>>> https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> fieldtrip mailing list
>>>>> fieldtrip at donders.ru.nl
>>>>> https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> fieldtrip mailing list
>>>> fieldtrip at donders.ru.nl
>>>> https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>>>>
>>>
>>>
>>> _______________________________________________
>>> fieldtrip mailing list
>>> fieldtrip at donders.ru.nl
>>> https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>>>
>>
>>
>
>
>
> _______________________________________________
> fieldtrip mailing list
> fieldtrip at donders.ru.nl
> https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20161029/40ed8d57/attachment-0002.html>