JERARCA HOWTO
NAME
JERARCA - Iterative hierarchical clustering utility
SYNOPSIS
Linux users from the console:
jerarca <Graph file> <Iterative algorithm> <Tree algorithm> <Iterations>
Windows users from a command window, and from the directory where Jerarca is located:
Jerarca.exe <Graph file> <Iterative algorithm> <Tree algorithm> <Iterations>
DESCRIPTION
This page documents Jerarca, a suite of algorithms designed to efficiently convert
unweighted undirected graphs into hierarchical trees by means of iterative hierar-
chical clustering. An iterative algorithm is used in order to create a matrix of
distances between every pair of nodes of the graph. Then, a phylogenetic algorithm
builds a hierarchical tree based on those distances. Once the tree is created, the
program reads the dendrogram and extracts the partition of nodes that best repre-
sents the community structure of the graph.
OPTIONS
Four input parameters must be used:
Graph file:
File containing a list of edges. Each edge is represented by a pair of nodes
separated by a tab or space.
Iterative algorithm:
Name of the iterative algorithm that will be run for creating the matrix of
distances among the nodes of the graph: RCluster, UVCluster, SCluster or the
three of them.
Four options are valid: r, uv, s and all.
Tree algorithm:
Name of the phylogenetic algorithm that will be used for the construction of
the tree from the matrix of distances: UPGMA, Neighbor-Joining or both.
Three options are valid: u, nj and all.
Iterations:
Number of iterations that the iterative algorithm will perform.
OUTPUT FILES
filename_tree_IterativeAlg_TreeAlg.nwk
Computed tree structure of the graph in Newick format.
filename_partitionH_IterativeAlg_TreeAlg.txt, filename_partitionQ_IterativeAlg_TreeAlg.txt:
File containing the most modular partition of the graph based whether on the
cumulative hypergeometric cumulative distribution of the links (H) or on the
Modularity (Q).
filename_partitionH_IterativeAlg_TreeAlg.meg, filename_partitionQ_IterativeAlg_TreeAlg.meg:
File that can be directly imported into the phylogenetic package MEGA 4. The
file includes the matrix of distances among nodes and the most modular dis-
tribution of nodes into clusters based whether on the cumulative hypergeome-
tric cumulative distribution of the links (H) or on the Modularity (Q).
filename_partitionH_IterativeAlg_TreeAlg.att, filename_partitionQ_IterativeAlg_TreeAlg.att:
File that can be imported into Cytoscape as node attributes. Each node is a-
ssigned to the cluster defined by the most modular partition of the tree based
whether on the cumulative hypergeometric cumulative distribution of the links
(H) or on the Modularity (Q).
USAGE EXAMPLES
(Linux users)
jerarca saccharomyces_interactome.tab s u 60000
Jerarca will perform 60000 iterations of the SCluster algorithm for computing
the matrix of distances between pairs of nodes and then the UPGMA algorithm
will be used in order to build a tree based on those distances.
(Windows users)
Jerarca.exe mitochondrial_ribosome.tab all nj 1000
Jerarca will perform 1000 iterations of each algorithm (RCluster, UVCluster and
SCluster) and will compute their respective matrices of distances between pairs of
nodes. Then, the Neighbor-Joining algorithm will be used in order to build a tree
based on each of those matrices.