[FieldTrip] Why does the output from cluster-based permutation tests show a p-value for each cluster, then the test really only has one p-value?

Mon Feb 23 10:20:08 CET 2026

Hi Steve,

Welcome back! I am not aware of any hidden purpose, but it could be there was one when initially deciding to also broadcast those p-values. You raise some very valid points in your message, and I totally agree. Hopefully either Eric or Robert will pick up on this and join the discussion with their ideas.

In my opinion it may have been a suboptimal decision to indeed compute p-values for each suprathreshold cluster, and to return these p-values with the data (i.e. in the spatio-(spectro-)temporal matrices. I always have a hard time explaining to our students (and also - or perhaps even a harder time - to our PIs) that indeed one should not talk about ’significant clusters’ etc., and the fact that each data point in the stats output (with cluster-based multiple comparison correction) gets a p-value indeed does not help much in the convincing process.

Best wishes,
Jan-Mathijs

On 22 Feb 2026, at 15:54, Stephen Politzer-Ahles via fieldtrip <fieldtrip at science.ru.nl> wrote:

Dear fieldtrip community,

(It's been quite a while since I was active on this list, so I apologize if this has already been discussed and I just missed it!)

When you do a cluster-based permutation test using ft_timelockstatistics (and I assume also with ft_freqstatistics), one of the fields of the output, .prob, shows multiple p-values: one for each cluster that was identified in the initial dataset.

But a cluster-based permutation test actually has only one test statistic. As described in Maris & Oostenveld (2007)<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.sciencedirect.com%2Fscience%2Farticle%2Fpii%2FS0165027007001707&data=05%7C02%7Cfieldtrip%40science.ru.nl%7C52704899703f4a42b9f008de72bcbd24%7C084578d9400d4a5aa7c7e76ca47af400%7C1%7C0%7C639074352105658301%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=Lk%2BLRWTSUhUMWONjMrkNOFaPhNR1H0WcL%2Fa9iaPrQmk%3D&reserved=0>, the procedure is to find clusters and then just take the largest cluster-level test statistic; if the function follows this procedure, then presumably all the other cluster-level test statistics might as well just disappear into the aether, because they serve no additional purpose (as far as I am aware), they're just an intermediate step in the analysis. The cluster-based permutation test tutorial<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.fieldtriptoolbox.org%2Ftutorial%2Fstats%2Fcluster_permutation_timelock%2F&data=05%7C02%7Cfieldtrip%40science.ru.nl%7C52704899703f4a42b9f008de72bcbd24%7C084578d9400d4a5aa7c7e76ca47af400%7C1%7C0%7C639074352105674192%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=qy91C3P%2BkrUJHUjIJEm1vTqwoyQoUvMk5nEVZK5kDiw%3D&reserved=0> on the fieldtrip wiki is also explicit (in two separate places!) in saying that "for the statistical inferential decision (i.e. reject or not the null hypothesis), only the p-value of the largest cluster is relevant" (emphasis added).

I suspect this may have led to some confusion, at least in my area of research (neurolinguistics). Over the past 15ish years there have been many papers reporting findings along the lines of "this contrast yielded two significant clusters, one here and one there!", and I suspect much of this came from people seeing multiple p>.05 entries in their stats.prob array (I did some of this myself back in the day when I was just beginning to use these tests). Granted, there has been information out for over a decade saying not to do that (e.g., How not to interpret results from a cluster-based permutation test<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.fieldtriptoolbox.org%2Ffaq%2Fstats%2Fclusterstats_interpretation%2F&data=05%7C02%7Cfieldtrip%40science.ru.nl%7C52704899703f4a42b9f008de72bcbd24%7C084578d9400d4a5aa7c7e76ca47af400%7C1%7C0%7C639074352105689349%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=WVbi2VETtEaV5xhvqbzDP2gElFktjBydHwJqhWVhauo%3D&reserved=0> on the fieldtrip wiki, and Sassenhagen et al. [2019]<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1111%2Fpsyp.13335&data=05%7C02%7Cfieldtrip%40science.ru.nl%7C52704899703f4a42b9f008de72bcbd24%7C084578d9400d4a5aa7c7e76ca47af400%7C1%7C0%7C639074352105703679%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=hnH6EBeO%2F%2BvOtiGlRNRoN0Y1EKnB3VuKOq6vGm3TvRg%3D&reserved=0>; these are more focused on a slightly different framing of the issue [they're more about how we shouldn't use these tests to make claims about where the significance of the effect begins and ends] but they also entail this same point [the test doesn't license conclusions about there being multiple clusters, for the same reasons it doesn't license conclusions about where a given cluster begins and ends]). But not all users read all of this before just running the code and seeing p-values and going with them. And, sure I get that it's a user's responsibility to RTFM, but the code seems to be really tempting people to look at those other p-values, whereas all the information telling people they shouldn't look at those p-values is much harder to find. (Also I recognize that fieldtrip is not the only software out there that does these tests, but as far as I know it was the first, and I think a lot of people using these first learned from fieldtrip.)

So, with all that in mind, I'm just wondering why fieldtrip outputs these extra p-values at all? Do they serve some other purpose that I'm just missing? (The tutorial's wording "for the statistical inferential decision ... only the p-value of the largest cluster is relevant" seems to imply that there may be some other purpose for which the other p-values are relevant, but I'm not sure that reading was intended, or what that other purpose may be.)

Thank you!
Steve

---
e-mail signature<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpolitzerahles.github.io%2Fe-mail-signature.html&data=05%7C02%7Cfieldtrip%40science.ru.nl%7C52704899703f4a42b9f008de72bcbd24%7C084578d9400d4a5aa7c7e76ca47af400%7C1%7C0%7C639074352105714624%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=wzRmxxkyuAAiFABSuOCGwbuxuuJM1CNSJinkOGG3WT8%3D&reserved=0>
_______________________________________________
fieldtrip mailing list
https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
https://doi.org/10.1371/journal.pcbi.1002202

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20260223/bedcbe34/attachment.htm>