[FieldTrip] question on cluster-based statistics and source localization

Wed Jan 10 15:55:57 CET 2018

Hi Vincent,

Let me reply through the email list, where other people might learn something and/or want to chime in.

> On 10 Jan 2018, at 13:22, Vincent Wens <vwens at ulb.ac.be> wrote:
> 
> Dear Pr. Oostenveld,
> 
> I am Vincent Wens, a physicist working in the MEG unit at Erasme Hospital, Brussels. We've been recently trying to play with the cluster-based statistics that you developed and included in Fieldtrip, but hit a difficulty in our analysis pipeline and I wondered if you would be so kind to take a few minutes and provide advice on this? 

The cluster-based statistics is a method for statistical inference, i.e. statistical decision making based on a hypothesis and estimated probability distribution. The hypothesis H0 states that the data can be exchanged (between conditions). If it is very unlikely that the data can be exchanged (under H0), we decide that the data must be different somehow. The clusters provide evidence for the data being different, but the clusters are not the difference itself. The tip of an iceberg above the sea level provides evidence for there being an iceberg, but the tip is not the iceberg itself. 

An important FAQ is this http://www.fieldtriptoolbox.org/faq/how_not_to_interpret_results_from_a_cluster-based_permutation_test

Let me give the other commens in-line in your email below

> Globally, my question is how to go from sensor-level cluster statistics results to the source space.
> 
> More precisely: Assume we run the cluster statistics analysis on, say, N-channels time-frequency plots and obtain significance for the maximum cluster statistic.

So you obtain evidence that the channel level data is different. That logically implies that the cortical activity is different. Note that the other way arround would not hold per see; there can be different activity in the brain without it showing up as a difference in the scalp data.

> We thus find one (and possibly more) supra-threshold cluster(s) whose "spatio-spectral-temporal localization" can be assessed.

You could look at the visible tip (i.e. the cluster), you could also take a broader approach and look at the phenomenom under the tip (the iceberg).

> See for example the attached picture depicting the plot of those T-values within the significant cluster associated with an ERD. The next step would then be to source localize this,

You would not localize the cluster (you already have it, it is at the channel level). You localize the cortical activity that causes the data to appear different at the channel level. It’s the part of the iceberg under the sea level that causes the tip to appear above sea level.

> and our initial idea was to use the time and frequency region from this cluster as a prior on the time and frequency used for source projection. However the very complex shape of the cluster does not make this step so obvious. There are multiple possibilities that would come to mind, most of them absolutely ad-hoc, so I wondered your opinion on what would be the most rigorous, or at least least unacceptable way to go (or even just the most standard way, if there's one).

Based on the resulls (the multiplot), you should wonder whether there is only a single feature in the brain that is different or whether there are multiple. The hypothesis you started with was “is there any difference in this massive multiple-comparision space?” and the (only) answer you got to that question was “yes”. 

You now have the question “what is the difference”, which pertains to interpreting the data. That question has no binary answer and a statistical test based of a p-value being small enough (which gives you a "yes/no" answer) does not help.

I cannot offer specific advice on how to interpret your data, but recommend that you consider whether your true quest is for “the (one and only) effect” or “the effects” that causes the data in the two conditions to be different. Of course you can argue that the effect(s) show at certain frequency ranges and/or latencies and/or locations, and therefore you may decide to look for the interpretation of the effect(s) at or around those parts of the cluster. 

In general (not any more for this dataset) it is worthwile to consider that narrow a-priori hypotheses provide more valuable and specific information. This is something that we often rely on in sequential studies, where in the 2nd study we don’t repeat the full hypothesis of the 1st study (e.g. “is there any difference”), but a more specific sub-hypothesis that we generated on basis of the first study (e.g. is there a difference around this specific time-frequency range).

> Thanks in advance for your invaluable help, and still my best wishes for the New Year.

You’re welcome. Please follow up questions on the email discussion list.

best regards,
Robert

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20180110/dedffb0b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster_1_grad.png
Type: image/png
Size: 191076 bytes
Desc: not available
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20180110/dedffb0b/attachment-0001.png>