[FieldTrip] Cluster-Statistics on wPLI data
nick.ketz at gmail.com
Tue Jun 2 23:55:07 CEST 2015
Hi Tom and Jake, thanks very much for adding momentum to this. I
haven't yet looked into spectral graph clustering, and really haven't
been thinking of it terms of graph theory but it now seems silly that I
If I'm understanding the suggestions correctly, the idea is to use the
eigenvectors of the connectivity measure (WPLI in this case) to
determine which nodes/voxels to cluster together. So nodes with similar
graphs will be clustered together. This would be analogous to the graph
bisection problem where k=2. The problem with mixture models is the
arbitrary nature of how many clusters to use, and without a way to
determine the statistical likelihood of a given cluster this is a real
problem for determining the validity of a given cluster.
Jake thanks for the resources, I'm beginning to dig into them and they
seem promising. My understanding is that graph bisection divides the
graph into two groups that have the minimum number of connections
between them. So this could be used to cluster nodes that have similar
topographies together, but it's not clear the best division would occur
in the first bisection, instead possibly in subsequent bisections. Is
this where the recursive algorithm comes in? I found this paper of
yours ' Spectral Techniques for Graph Bisection in Genetic Algorithms',
but it doesn't seem to get past the initial bisection.
As I said, I haven't really been thinking of the problem in graph theory
terms, but if my understanding is correct the current data set I'm
working with could have an adjacency matrix as large as 128x128x70x50
(chan x chan x time x frequency), and as small as 128x128 for a single
time x frequency bin.
On 6/2/15 12:38 PM, Jacob Martin wrote:
> Hi Tom and Nick,
> Here's a paper i wrote which may help give some background on spectral clustering:
> "Ranks and representations for spectral graph bisection"
> I also have a genetic algorithm which uses recursive spectral graph bisection to cluster graphs using the ideas from the above paper. It is written in Java and is quite competitive and fast compared to other algorithms. Nick, perhaps i could lend a hand.
> I suggest looking up Chris Walshaw's partition archive for lots of examples of algorithms. I had a few minimum bisection records in there a while back with an older algorithm, but haven't looked at it in years or tried with my new algorithms yet.
> How big are your adjacency matrices? Note that the minimum bisection problem is NP complete.
> Jacob Martin, PhD
> Center for Brain and Cognition
> CerCo CNRS
> Toulouse, France
> On 02 Jun 2015, at 13:33, Tom Holroyd <tomh at kurage.nimh.nih.gov> wrote:
> Several people have been doing things with spectral clustering,
> involving the graph Laplacian, wherein we say that each voxel of the
> brain is a node in a graph, and various connectivity measures connect
> the nodes. The Laplacian of the graph has many interesting properties,
> and the first K eigenvalues can be used to create clusters.
> I'm afraid I don't know how to do permutation stats on that, but
> k-means clustering of the graph eigenvectors can have random starting
> points ...
> On Tue, 02 Jun 2015 09:53:42 -0600
> Nick Ketz <nick.ketz at gmail.com> wrote:
>> Mathis, you have hit on a much larger problem. Tzvetan is right to
>> suggest calculating each grid-point's connectivity with all other
>> grid-points, which gives you a connectivity topography map for each
>> grid-point. The next logical step, which I've been struggling with,
>> is to somehow cluster these topographies together, giving you
>> clusters of neighboring grid-point topographies that are similar. Is
>> there any work being done to solve this problem?
>> I have been struggling with this problem for some time, that is: how
>> to do cluster based permutation statistics in chan x chan bins of
>> connectivity measures. Ideally I want to do this in a full chan x
>> chan x freq x time cluster analysis. I've searched on the list for
>> related topics; posted, probably incoherently, my own issues; and
>> generally scanned the list and the literature for any sort of
>> relevant topics trying to broach this problem, but am still stuck.
>> The standard 3d chan x freq x time clustering uses a neighborhood
>> structure to define what channels are considered neighbors, and uses
>> numeric adjacency to cluster time and frequency. With chan x chan
>> data however, there unfortunately isn't an intuitive way to think
>> about neighborhood for the pairwise connectivity. There are
>> neighbors surrounding the reference grid-point, and there are
>> neighbors in each grid-point's connectivity topography, i.e.
>> neighboring grid-points that have similar/significant connectivity
>> values. How do you combine the two in a sensible way? And even more
>> challenging how do you do statistics on them?
>> I have been considering several approaches, including
>> multi-dimensional scaling of these grid-point connectivity
>> topographies, or just a correlation between topographies, and then
>> using those values to select which reference grid-points to consider
>> as similar enough to average together. This then gives you a
>> reference cluster of grid-points and allows you to do standard
>> cluster permutation statistics on the average of the reference
>> grid-points. How exactly to determine which reference grid-points
>> are 'similar enough' to be consider clusters is the tricky part, i.e.
>> how can you statistically test if they should be considered a cluster
>> or not?
>> I've been working in isolation on this problem for some time, and
>> would love a discussion on the topic. I'm hoping this message spurs
>> other list lurkers to at least commiserate with me, but if possible
>> to also post whatever approaches they have taken to solve the problem
>> of clustering pairwise connectivity data.
>>> On 6/2/15 4:00 AM, fieldtrip-request at science.ru.nl wrote:
>>> Hi Mathis, alternatively you could use ft_sourcestatistics. For
>>> this you should reduce your 600 x 600 x freq matrix to freq of
>>> interest first. Next, the resulting 600 x 600 matrix you?d reduce
>>> to 600 x 1 where each grid point has a mean connectivity value to
>>> all possible grid points. >From then on you could use
>>> ft_sourcestatistics and ft_sourceinterpolate to visualize. Good
>>> luck, Tzvetan
>>>>> Dear Fieldtrip Users,
>>>>> I am a first-year PhD student and have been lurking here for a
>>>>> long time (finding lots of useful answers), it's about time I ask
>>>>> my first question:
>>>>> I am trying to compare connectivity in source space (~25
>>>>> subjects, between two conditions, ~600 virtual channels on a
>>>>> 1.5cm grid) using ft_freqstatistics with a cluster-based
>>>>> permutation test (see code below).
>>>>> The original input datasets have the dimensions: chan x chan x
>>>>> freq. I tried restructuring them to chancmb x freq
>>>>> before doing ft_freqstatistics, but the function then takes
>>>>> forever appending the (admittedly big) datasets.
>>>>> I then tried feeding them to ft_freqstatistics without
>>>>> restructuring, which didn't throw any errors. However, I noticed
>>>>> that the resulting clusters consist of adjacent cells in the chan
>>>>> x chan matrix, which doesn't make much sense since channels in
>>>>> adjacent cells are not necessarily spatially adjacent. I suspect
>>>>> that ft_freqstatistic assumes the 3-D input to contain a temporal
>>>>> dimension and therefore tries to build temporally adjacent
>>>>> clusters. Could any of the fieldtrip developers comment on this?
>>>>> Does anyone in the FT-community have any advice on how to
>>>>> statistically evaluate connectivity data using the permutation
>>>>> Thanks for any input, best wishes,
>>>>> Mathis Kaiser
>> fieldtrip mailing list
>> fieldtrip at donders.ru.nl
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the fieldtrip