Hierarchical clustering

Posted: 13/09/2025

Hierarchical clustering techniques are an important category of clustering methods. They can be divided in:

Agglomerative hierarchical clustering: start with the points as individual clusters, then merge the closest pair.
Divisive hierarchical clustering: start with one big, all-inclusive cluster, then split it into smaller clusters until only individual points remain.

The first group is most commonly used. A hierarchical clustering can be graphically displayed by using a dendrogram, which is a tree-like graph that displays the order in which the clusters were merged (agglomerative) or split (divisive).

Algorithm 1: Agglomerative hierarchical clustering algorithm.


    1. Compute the proximity matrix.
    2. repeat
    3.    Merge the closest two points.
    4.    Update the proximity matrix to reflect the proximity between the new cluster
    and the original cluster.
    5. until Only one cluster remains.

The key issue is defining proximity between clusters. Common measures include:

MIN or single link: defines the proximity between clusters as the distance of the closest two points that are in different clusters. It can handle non-elliptical shaped clusters, but it’s sensitive to noise and outliers;
MAX or complete link or CLIQUE: defines the proximity between clusters as the maximum distance between two points that are in different clusters. It’s resistant to noise and outliers, but performs poorly for non globular clusters and tends to break them apart when too large;
Group average:defines the proximity as the average distances of all possible pairs of points from different clusters. The trade-off between MIN and MAX, is not too affected by noise and outliers, but still tends to be biased towards globular clusters;
Distance between centroids;
Measure guided by an objective function: an example is Ward’s method, which measures the proximity between two clusters in terms of increase of SSE as a result of merging those clusters. Also influenced very little by noise and outliers, but biased towards globular clusters.

This clustering algorithm does not assume a particular number of clusters, since the “desired” number can be obtained by simply cutting the dendrogram at the appropriate height.

← Back to Blog List