Home » Uncategorized

Detecting and Visualising Clusterings Interaction Networks (And a few other cool things like Facebook)

This article focuses on cases such as Facebook and protein interaction networks. The article was written by By Paul Scherer (paulmorio) and submitted as a research paper to HackCambridge. What makes this article interesting is the fact that it compares five clustering techniques for this type of problems:

  • K Clique Percolation – A clique merging algorithm. Given a set kk, the algorithm goes on to produce kk clique clusters and merge them (percolate) as necessary.
  • MCode – seed growth approach to finding dense subgraphs
  • DP Clustering – seed growth approach to finding dense subgraphs similar to MCODE but has an internal representation of weights in the edges, and the stopiing condition is different.
  • IPCA – Modified DPClus Algorithm which focuses on maintaining the diameter of a cluster (defined as the maximum shortest distance between all pairs of vertices, rather than its density.
  • CoAch – Combined Approach with finding a small number of cliques as complexes first and then growing them.

The articles also provides great visualizations such as the one below:

1994964

In the original article, these visualizations are interactive, and you will find out which software was used to produce them.

Below is the summary (written by the original author):

Summary

For my submission to HackCambridge I wanted to spend my 24 hours learning something new in accordance with my interests. I was recently introduced to protein interaction networks in my Bioinfomartics class, and during my review of machine learning techniques for an exam noticed that we study many supervised methods, but no unsupervised methods other than the k means clustering. Thus I decided to combine the two interests by clustering the Protein interaction networks with unsupervised clustering techniques and communicate my learning, results, and visualisations using the Beaker notebook.

The study of protein-protein interactions (PPIs) determined by high-throughput experimental techniques has created karge sets of interaction data and a new need for methods allowing us to discover new information about biological function. These interactions can be thought of as a large-scale network, with nodes representing proteins and edges signifying an interaction between two proteins. In a PPI network, we can potentially find protein complexes or functional modules as densely connected subgraphs. A protein complex is a group of proteins that interact with each other at the same time and place creating a quaternary structure. Functional modules are composed of proteins that bind each other at different times and places and are involved in the same cellular process. Various graph clustering algorithms have been applied to PPI networks to detect protein complexes or functional modules, including several designed specifically for PPI network analysis. A select few of the most famous and recent topographical clustering algorithms were implemented based on descriptions from papers, and applied to PPI networks. Upon completion it was recognized that it is possible to apply these to other interaction networks like friend groups on social networks, site maps, or transportation networks to name a few.

I decided to Graphistry’s GPU cluster to visualize the large networks with the kind permission of Dr. Meyerovich. (Otherwise I would have likely not finished on time given the specs of my machine) and communicate my results and learning process

The full version with mathematical formulas, detailed descriptions, and source code, can be found here. For more articles about clustering, click here. This link will give you access to the following articles:

1994995

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge