[FieldTrip] Calculating the variance of a group with given variances of each subject
Schoffelen, J.M. (Jan Mathijs)
janmathijs.schoffelen at donders.ru.nl
Fri Sep 30 13:46:40 CEST 2022
I would say that for visualization purposes (and for traditional parametric second-level statistics) you are interested in the variance across your units-of-observation. In this case across subjects. This is indeed what ft_timelockgrandaverage gives you. If you are into more sophisticated estimates, it smells a bit as if you want to be moving into the direction of mixed-effect models, which are gaining quite a bit of momentum these days. Since I am a bit old-school I have not eaten any cheese from that (http://www.dwotd.nl/2008/02/360-daar-heb-ik-geen-kaas-van-gegeten.html), so perhaps somebody else wants to chime in here.
On 26 Sep 2022, at 21:36, Sebastian Neudek via fieldtrip <fieldtrip at science.ru.nl<mailto:fieldtrip at science.ru.nl>> wrote:
I am currently writing my master's thesis using FieldTrip and might need a bit of your expertise for a little hurdle I encountered.
First a few words about what I want to do:
I have data from a MEG-go/nogo study: There are multiple subjects and multiple trials for each condition. I now want to plot the ERF for each condition averaged across all subjects. But together with the ERF I also want to plot the standard error or variance or confidence interval or standard deviation. Something like this(Figure 3a):
Independent which of these measures I finally use, I need to calculate the variance of the ERF at each timepoint.
To do this I first used ft_timelockanalysis to calculate the averages of each subject. ft_timelockanalysis returns not only the average, but also the variance of the ERF. Afterwards I use ft_timelockgrandaverage to calculate the group average of the ERF.
And this is where my troubles begin. I get how ft_timelockgrandaverage calulates the new average: It averages over the mean values of the subjects (it also accounts for the degrees of freedom). But for the variance it just calculates the variance of the means. I thought, it will also account for the variances of each subject.
In the extreme case it leads to the following:
Imagine you have two subjects and you measure very close averages avg(sub1)=1 and avg(sub2)=1.0001. But their variances are extremly high: var(sub1)=var(sub2)=100. Then the variance calculated by ft_timelockgrandaverage is almost zero, when in reality it should be something like 100.
Maybe this should be noted in ft_timelockgrandaverage, because it can lead to wrong statistics if the user isn't aware of this effect.
Back to my task:
This is surely not the variance I want to use for my ERF plots and therefore I searched for a different calculation.
I found on wikipedia a formula for so called 'pooled variance' (at the bottom of the page for sample-based statistics):
But I am not 100% happy with this pooled variance.
First of all the deegrees of freedom is not accounted for. Then there is a different mean calculated than in ft_timelockgrandaverage: The mean is accounting for the number of trials for each subject. I don't know if the different mean is a pro or a con (On the one hand, means with a higher precisision because of a higher number of trials are weighted higher than means with a lower trial number. On the other hand, each subject on the group should be weighted same).
I also found a second formula on stackexchange, which also calculates a variances of variances:
Therefore I am uncertain which formula to use, because as it seems, they differ.
I hope somebody of you is a little more experienced in statistics than I am and can help me out finding the correct or best calculation for this variance. Best case would be if it is already implemented in fieldtrip and I only missed it, but I can also implement it myself.
fieldtrip mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the fieldtrip