Class AverageLinkage
java.lang.Object
ch.usi.inf.sape.hac.agglomeration.AverageLinkage
- All Implemented Interfaces:
AgglomerationMethod
The "average", "group average", "unweighted average", or
"Unweighted Pair Group Method using Arithmetic averages (UPGMA)",
is a graph-based approach.
The distance between two clusters is calculated as the average
of the distances between all pairs of objects in opposite clusters.
This method tends to produce small clusters of outliers,
but does not deform the cluster space.
[The data analysis handbook. By Ildiko E. Frank, Roberto Todeschini]
The general form of the Lance-Williams matrix-update formula:
d[(i,j),k] = ai*d[i,k] + aj*d[j,k] + b*d[i,j] + g*|d[i,k]-d[j,k]|
For the "group average" method:
ai = ci/(ci+cj)
aj = cj/(ci+cj)
b = 0
g = 0
Thus:
d[(i,j),k] = ci/(ci+cj)*d[i,k] + cj/(ci+cj)*d[j,k]
= ( ci*d[i,k] + cj*d[j,k] ) / (ci+cj)
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondouble
computeDissimilarity
(double dik, double djk, double dij, int ci, int cj, int ck) Compute the dissimilarity between the newly formed cluster (i,j) and the existing cluster k.toString()
-
Constructor Details
-
AverageLinkage
public AverageLinkage()
-
-
Method Details
-
computeDissimilarity
public double computeDissimilarity(double dik, double djk, double dij, int ci, int cj, int ck) Description copied from interface:AgglomerationMethod
Compute the dissimilarity between the newly formed cluster (i,j) and the existing cluster k.- Specified by:
computeDissimilarity
in interfaceAgglomerationMethod
- Parameters:
dik
- dissimilarity between clusters i and kdjk
- dissimilarity between clusters j and kdij
- dissimilarity between clusters i and jci
- cardinality of cluster icj
- cardinality of cluster jck
- cardinality of cluster k- Returns:
- dissimilarity between cluster (i,j) and cluster k.
-
toString
-