Hierarchical Agglomerative Clustering Algorithm in C#


In some places this website uses clustering algorithms to create different kinds of groups of similar objects. One such algorithm in use is a hierarchical agglomerative cluster algorithm. This algorithm is a bottom-up approach:
  • Initially all objects represent a separate cluster
  • Gradually a pair of similar clusters is merged into a new cluster
  • The algorithm stops if the clusters are too far apart to be merged (distance criterion) or if a specified count of clusters is reached (number criterion)
It can be used with different types of data: numeric, binary, sets - whatever as long as there is a distance measure that is able to calculate the similarity distance of two objects. To aggregate clusters to a bigger one a fusion function is needed. In the field of cluster analysis there are several ones out there such as single-linkage, complete-linkage, average-linkage or centroid-linkage.

Since we didn't find a library supporting hierarchical agglomerative clustering in .NET for a general purpose we wrote our own one that can be downloaded here as open source (you can use or modify it without any restrictions). As default it uses the Jaccard index as similarity distance and the single-linkage algorithm as fusion method. Except complete-linkage no further distance and fusion functions are implemented but you can easily write your own ones by implementing IDistanceMetric as the interface for concrete distance function classes and by inheriting Fusion as abstract class for concrete fusion function classes. The algorithm can be run by using either the number criterion or the distance criterion to stop the clustering. Due to the complexity of O(n3) the code is not intended to use for large datasets.

Download HAC C# as Visual Studio solution (75 kB, last update 02/03/2013)
Links:
Hierarchical Agglomerative Clustering (in German)
Single-linkage clustering
Jaccard index