[FieldTrip] impact of skewed power distributions on data analysis

Mon Dec 19 22:08:40 CET 2016

I appreciate everyone's feedback, but I still wonder if something is being
missed.  I understand that the non-normally distributed power values may be
less of an issue when performing non-parametric stats or even a
paired-samples t-test that looks at difference values which may be normal
even when the raw data isn't.  However, my concern comes into play even
before these statistical comparisons are made, whenever any averaging is
done to freq-type data across times, frequencies, trials, electrodes,
subjects, etc.  That means any time any of these configuration options are
used for any of these functions, and probably more:

ft_freqanalysis:          cfg.keeptrials or cfg.keeptapers = 'no';
ft_freqgrandaverage:   cfg.keepindividual = 'no';
ft_freqstatistics:         cfg.avgoverchan, cfg.avgovertime, or
cfg.avgoverfreq = 'yes';
ft_freqbaseline:          cfg.baseline = anything but 'no'

In each case, if raw power values are averaged, the result will be
positively skewed.  Maybe it's not a huge problem if all of the data is
treated identically, but the specific case that triggered my concern was in
ft_freqbaseline, where the individual time-frequency bins are compared to
the mean over time for the baseline period.  For example, when using
cfg.baselinetype = 'db', as Giuseppe Pellizzer suggested, the output freq
data does indeed have a more normal distribution over time, but the mean
over the baseline time period is performed *before* the log transform, when
the distribution is still highly skewed:

  meanVals = repmat(nanmean(data(:,:,baselineTimes), 3), [1 1 size(data,
3)]);
  data = 10*log10(data ./ meanVals);

That's what I had originally done when analyzing data for my SfN poster,
when I realized the background noise that shouldn't have changed much from
baseline was mostly showing a decrease from baseline of about -3dB.

Now, I've realized I'm seeing this as more of a problem than others because
of another tweak I made, which was to use a long, separate baseline
recording to normalize my trial data, rather than a short pre-trial period
as ft_freqbaseline is designed to do.  Averaging a few hundred milliseconds
for a baseline power estimate might be okay because overlapping time points
in the original data are used to calculate those power values anyway,
probably making them less skewed, but also (it seems to me) more arbitrary
and prone to error.  I already offered my custom function BLnorm.m to one
person who was asking about this issue of normalizing to a separate
baseline recording, and I would be happy to contribute it to FieldTrip if
others would appreciate it.

Since a few people suggested using the median, and it is also
suggested in Cohen's
textbook <https://mitpress.mit.edu/books/analyzing-neural-time-series-data>
as an alternative measure of the central tendency for skewed raw power
values, I wonder if the simplest fix might be to add an option to select
mean or median in each of the functions listed above.  Another possibility
would be adding an option to transform the power values upon output from
ft_freqanalysis.

Would anyone else find such changes useful?

Thanks,
Teresa

On Wed, Dec 14, 2016 at 4:22 AM, Herring, J.D. (Jim) <
J.Herring at donders.ru.nl> wrote:

> In terms of statistics it is the distribution of values that you do the
> statistics on that matters. In case of a paired-samples t-test when
> comparing two conditions, it is the distribution of difference values that
> has to be normally distributed. The distribution of difference values is
> often normal given two similarly non-normal distributions, offering no
> complications for a regular parametric test.
>
>
>
> The non-parametric tests offered in fieldtrip indeed do not assume
> normality, so you should have no problem there either.
>
>
>
>
>
> *From:* fieldtrip-bounces at science.ru.nl [mailto:fieldtrip-bounces@
> science.ru.nl] *On Behalf Of *Alik Widge
> *Sent:* Tuesday, December 13, 2016 3:10 PM
> *To:* FieldTrip discussion list <fieldtrip at science.ru.nl>
> *Subject:* Re: [FieldTrip] impact of skewed power distributions on data
> analysis
>
>
>
> In this, Teresa is right and we have observed this in our own EEG data --
> depending on one's level of noise and number of trials/patients, the mean
> can be a very poor estimator of central tendency. My students are still
> arguing about what we really want to do with it, but at least one of them
> has shifted to using the median as a matter of course for baseline
> normalization.
>
>
> Alik Widge
> alik.widge at gmail.com
> (206) 866-5435
>
>
>
> On Mon, Dec 12, 2016 at 6:45 PM, Teresa Madsen <tmadsen at emory.edu> wrote:
>
> That may very well be true; to be honest, I haven't looked that deeply
> into the stats offerings yet. However, my plan is to express each
> electrode's experimental data in terms of change from their respective
> baseline recordings before attempting any group averaging or statistical
> testing, and this problem shows up first in the baseline correction step,
> where FieldTrip averages raw power over time.
>
> ~Teresa
>
>
>
> On Mon, Dec 12, 2016 at 4:56 PM Nicholas A. Peatfield <
> nick.peatfield at gmail.com> wrote:
>
> Correct me if I'm wrong, but, if you are using the non-parametric
> statistics implemented by fieldtrip, the data does not need to be normally
> distributed.
>
>
>
> On 12 December 2016 at 13:39, Teresa Madsen <tmadsen at emory.edu> wrote:
>
> No, sorry, that's not what I meant, but thanks for giving me the
> opportunity to clarify. Of course everyone is familiar with the 1/f pattern
> across frequencies, but the distribution across time (and according to the
> poster, also across space), also has an extremely skewed, negative
> exponential distribution. I probably confused everyone by trying to show
> too much data in my figure, but each color represents the distribution of
> power values for a single frequency over time, using a histogram and a line
> above with circles at the mean +/- one standard deviation.
>
> My main point was that the mean is not representative of the central
> tendency of such an asymmetrical distribution of power values over time.
> It's even more obvious which is more representative of their actual
> distributions when I plot e^mean(logpower) on the raw plot and
> log(mean(rawpower)) on the log plot, but that made the figure even more
> busy and confusing.
>
> I hope that helps,
> Teresa
>
>
>
> On Mon, Dec 12, 2016 at 3:47 PM Nicholas A. Peatfield <
> nick.peatfield at gmail.com> wrote:
>
> Hi Teresa,
>
>
>
> I think what you are discussing is the 1/f power scaling of the power
> spectrum. This is one of the reasons that comparisons are made within
> a band (i.e. alpha to alpha) and not between bands (i.e. alpha to gamma),
> as such the assumption is that within bands there should be a relative
> change against baseline and this is what the statistics are performed on.
> That is, baseline correction is assumed to be the mean for a specific
> frequency and not a mean across frequencies.
>
>
>
>  And this leads to another point that when you are selecting a frequency
> range to do the non-parametric statistics on you should not do 1-64 Hz but
> break it up based on the bands.
>
>
>
> Hope my interpretation of your point is correct. I sent in individually,
> as I wanted to ensure I followed your point.
>
>
>
> Cheers,
>
>
>
> Nick
>
>
>
>
>
> On 12 December 2016 at 08:23, Teresa Madsen <tmadsen at emory.edu> wrote:
>
> FieldTrippers,
>
>
>
> While analyzing my data for the annual Society for Neuroscience meeting, I
> developed a concern that was quickly validated by another poster (full
> abstract copied and linked below) focusing on the root of the problem:
>  neural oscillatory power is not normally distributed across time,
> frequency, or space.  The specific problem I had encountered was in
> baseline-correcting my experimental data, where, regardless of
> cfg.baselinetype, ft_freqbaseline depends on the mean power over time.
> However, I found that the distribution of raw power over time is so skewed
> that the mean was not a reasonable approximation of the central tendency of
> the baseline power, so it made most of my experimental data look like it
> had decreased power compared to baseline.  The more I think about it, the
> more I realize that averaging is everywhere in the way we analyze neural
> oscillations (across time points, frequency bins, electrodes, trials,
> subjects, etc.), and many of the standard statistics people use also rely
> on assumptions of normality.
>
>
>
> The most obvious solution for me was to log transform the data first, as
> it appears to be fairly log normal, and I always use log-scale
> visualizations anyway.  Erik Peterson, middle author on the poster, agreed
> that this would at least "restore (some) symmetry to the error
> distribution."  I used a natural log transform, sort of arbitrarily to
> differentiate from the standard decibel transform included in FieldTrip as
> cfg.baselinetype = 'db'.  The following figures compare the 2 distributions
> across several frequency bands (using power values from a wavelet
> spectrogram obtained from a baseline LFP recorded in rat prelimbic
> cortex).  The lines at the top represent the mean +/- one standard
> deviation for each frequency band, and you can see how those descriptive
> stats are much more representative of the actual distributions in the log
> scale.
>
>
>
>
> 
>
> For my analysis, I also calculated a z-score on the log transformed power
> to assess how my experimental data compared to the variability of the noise
> in a long baseline recording from before conditioning, rather than a short
> pre-trial baseline period, since I find that more informative than any of
> FieldTrip's built-in baseline types.  I'm happy to share the custom
> functions I wrote for this if people think it would be a useful addition to
> FieldTrip.  I can also share more about my analysis and/or a copy of the
> poster, if anyone wants more detail - I just didn't want to make this email
> too big.
>
>
>
> Mostly, I'm just hoping to start some discussion here as to how to address
> this.  I searched the wiki
> <http://www.fieldtriptoolbox.org/development/zscores>, listserv
> <https://mailman.science.ru.nl/pipermail/fieldtrip/2006-December/000773.html>
>  archives
> <https://mailman.science.ru.nl/pipermail/fieldtrip/2010-March/002718.html>,
> and bugzilla <http://bugzilla.fieldtriptoolbox.org/show_bug.cgi?id=1574> for
> anything related and came up with a few topics surrounding normalization
> and baseline correction, but only skirting this issue.  It seems important,
> so I want to find out whether others agree with my approach or already have
> other ways of avoiding the problem, and whether FieldTrip's code needs to
> be changed or just documentation added, or what?
>
>
>
> Thanks for any insights,
>
> Teresa
>
>
>
>
> 271.03 / LLL17 - Neural oscillatory power is not Gaussian distributed
> across time
> <http://www.abstractsonline.com/pp8/#!/4071/presentation/24150>
>
> *Authors*
>
> **L. IZHIKEVICH*, E. PETERSON, B. VOYTEK;
> Cognitive Sci., UCSD, San Diego, CA
>
> *Disclosures*
>
>  *L. Izhikevich:* None. *E. Peterson:* None. *B. Voytek:* None.
>
> *Abstract*
>
> Neural oscillations are important in organizing activity across the human
> brain in healthy cognition, while oscillatory disruptions are linked to
> numerous disease states. Oscillations are known to vary by frequency and
> amplitude across time and between different brain regions; however, this
> variability has never been well characterized. We examined human and animal
> EEG, LFP, MEG, and ECoG data from over 100 subjects to analyze the
> distribution of power and frequency across time, space and species. We
> report that between data types, subjects, frequencies, electrodes, and
> time, an inverse power law, or negative exponential distribution, is
> present in all recordings. This is contrary to, and not compatible with,
> the Gaussian noise assumption made in many digital signal processing
> techniques. The statistical assumptions underlying common algorithms for
> power spectral estimation, such as Welch's method, are being violated
> resulting in non-trivial misestimates of oscillatory power. Different
> statistical approaches are warranted.
>
>
>
> --
>
> Teresa E. Madsen, PhD
> Research Technical Specialist:  *in vivo *electrophysiology & data
> analysis
>
> Division of Behavioral Neuroscience and Psychiatric Disorders
> Yerkes National Primate Research Center
>
> Emory University
>
> Rainnie Lab, NSB 5233
> 954 Gatewood Rd. NE
> Atlanta, GA 30329
>
> (770) 296-9119
>
> braingirl at gmail.com
>
> https://www.linkedin.com/in/temadsen
>
>
>
>
>
> _______________________________________________
> fieldtrip mailing list
> fieldtrip at donders.ru.nl
> https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>
>
>
>
>
> --
>
> Nicholas Peatfield, PhD
>
>
>
>
>
>
>
> --
>
> Nicholas Peatfield, PhD
>
>
>
>
> _______________________________________________
> fieldtrip mailing list
> fieldtrip at donders.ru.nl
> https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>
>
>
> _______________________________________________
> fieldtrip mailing list
> fieldtrip at donders.ru.nl
> https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>

-- 
Teresa E. Madsen, PhD
Division of Behavioral Neuroscience and Psychiatric Disorders
Yerkes National Primate Research Center
Emory University
Rainnie Lab, NSB 5233
954 Gatewood Rd. NE
Atlanta, GA 30329
(770) 296-9119
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20161219/2e90955c/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 38279 bytes
Desc: not available
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20161219/2e90955c/attachment-0002.png>