QtClustering API qtclustering-163 (20111029234107)

edu.cornell.med.icb.clustering
Class QTClusterer

java.lang.Object
  extended by edu.cornell.med.icb.clustering.QTClusterer
All Implemented Interfaces:
Clusterer

public final class QTClusterer
extends Object
implements Clusterer

Implements the QT Clustering algorithm. (QT stands for Quality Threshold/diameter of the cluster). See Wikipedia and Heyer LJ et al 1999.

Note that this implementation does not reference the actual instances of the data being clustered directly. It assumes the data is held in an external array or indexed collection. It relies on the specific implementation of SimilarityDistanceCalculator to map the indices back to the original data source.

For example, if you wanted to group the text of Abraham Lincoln's Gettysburg Address into clusters of words of the same size, the following code snippet would achieve this.

    final String text = "Four score and seven years ago our fathers brought forth on this"
               + " continent a new nation conceived in liberty and dedicated to the proposition"
               + " that all men are created equal";
    // break the text up into an array of individual words
    final String[] words = text.split(" ");
    // create a distance calculator that returns the difference in size between the two words
    final SimilarityDistanceCalculator distanceCalculator =
               new MaxLinkageDistanceCalculator() {
                   public double distance(final int i, final int j) {
                       return Math.abs(words[i].length() - words[j].length());
                   }
               };

    // and cluster the words into groups according to their size
    final Clusterer clusterer = new QTClusterer(words.length);
    final List<int[]> clusters = clusterer.cluster(distanceCalculator, 0);
 
The cluster arrays that result are:
   { 2, 27, 26, 25, 22, 19, 14, 6, 5 },
   { 1, 29, 9, 4, 3 },
   { 7, 28, 18, 8 },
   { 0, 24, 11 },
   { 10, 21, 17 },
   { 12, 20, 16 },
   { 13 },
   { 15 },
   { 23 }
 
The cluster arrays are indexes into the original data source and will map back into the text as follows:
   { "and", "are", "men", "all", "the", "and", "new", "our", "ago" },
   { "score", "equal", "forth", "years", "seven" },
   { "fathers", "created", "liberty", "brought" },
   { "Four", "that", "this" },
   { "on", "to", "in" },
   { "continent", "dedicated", "conceived" },
   { "a" },
   { "nation" },
   { "proposition" }
 
Changing the threshold given to the cluster(SimilarityDistanceCalculator, double) method will result in different groupings of words.


Constructor Summary
QTClusterer(int numberOfInstances)
          Construct a new quality threshold clusterer.
QTClusterer(int numberOfInstances, int numberOfThreads)
          Construct a new quality threshold clusterer.
QTClusterer(int numberOfInstances, ParallelTeam team)
          Construct a new quality threshold clusterer.
 
Method Summary
 List<int[]> cluster(SimilarityDistanceCalculator calculator, double qualityThreshold)
          Groups instances into clusters.
 List<int[]> getClusters()
          Returns the list of clusters produced by clustering.
 long getLogInterval()
          Get the the progress logging interval.
 boolean isLogClusterProgress()
          Is progress on clusters being logged?
 boolean isLogInnerLoopProgress()
          Is progress on the inner loop being logged?
 boolean isLogOuterLoopProgress()
          Is progress on the outer loop being logged?
 void setLogClusterProgress(boolean value)
          Should progress on the clusters be logged?
 void setLogInnerLoopProgress(boolean value)
          Should progress on the inner loop be logged?
 void setLogInterval(long interval)
          Set the the progress logging interval.
 void setLogOuterLoopProgress(boolean value)
          Should progress on the outer loop be logged?
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

QTClusterer

public QTClusterer(int numberOfInstances)
Construct a new quality threshold clusterer.

Parameters:
numberOfInstances - The number of instances to cluster. i.e., |G| where G is the set of instances

QTClusterer

public QTClusterer(int numberOfInstances,
                   int numberOfThreads)
Construct a new quality threshold clusterer.

Parameters:
numberOfInstances - The number of instances to cluster. i.e., |G| where G is the set of instances
numberOfThreads - Number of threads to use when clustering.

QTClusterer

public QTClusterer(int numberOfInstances,
                   ParallelTeam team)
Construct a new quality threshold clusterer.

Parameters:
numberOfInstances - The number of instances to cluster. i.e., |G| where G is the set of instances
team - The parallel team to use when clustering.
Method Detail

cluster

public List<int[]> cluster(SimilarityDistanceCalculator calculator,
                           double qualityThreshold)
Groups instances into clusters. Returns the indices of the instances that belong to a cluster as an int array in the list result.

Specified by:
cluster in interface Clusterer
Parameters:
calculator - The SimilarityDistanceCalculator that should be used when clustering
qualityThreshold - The QT clustering algorithm quality threshold (d)
Returns:
The list of clusters.

getClusters

public List<int[]> getClusters()
Returns the list of clusters produced by clustering.

Specified by:
getClusters in interface Clusterer
Returns:
A list of integer arrays, where each array represents a cluster and contains the index of the instance that belongs to a given cluster.

isLogClusterProgress

public boolean isLogClusterProgress()
Is progress on clusters being logged?

Returns:
true if logging is enabled for clusters being built.

setLogClusterProgress

public void setLogClusterProgress(boolean value)
Should progress on the clusters be logged?

Parameters:
value - indicates whether or not logging should be enabled for clusters being built.

isLogInnerLoopProgress

public boolean isLogInnerLoopProgress()
Is progress on the inner loop being logged?

Returns:
true if logging is enabled for the inner loop

setLogInnerLoopProgress

public void setLogInnerLoopProgress(boolean value)
Should progress on the inner loop be logged?

Parameters:
value - indicates whether or not logging should be enabled for the inner loop

isLogOuterLoopProgress

public boolean isLogOuterLoopProgress()
Is progress on the outer loop being logged?

Returns:
true if logging is enabled for the outer loop

setLogOuterLoopProgress

public void setLogOuterLoopProgress(boolean value)
Should progress on the outer loop be logged?

Parameters:
value - indicates whether or not logging should be enabled for the outer loop

getLogInterval

public long getLogInterval()
Get the the progress logging interval.

Returns:
the logging interval in milliseconds.

setLogInterval

public void setLogInterval(long interval)
Set the the progress logging interval.

Parameters:
interval - the logging interval in milliseconds.

QtClustering API qtclustering-163 (20111029234107)

Copyright © 2005-2011 Institute for Computational Biomedicine, All Rights Reserved.