Cluster Analysis
Cluster analysis is a statistical technique used in classifying data points into categories called clusters that can be associated with each other in terms of relevancy. These clusters help researchers to understand the correlation between data. As a result, this technique is also referred to as taxonomy analysis, clustering, or segmentation analysis. In the business ecosystem, it finds wide application in analyzing customer activities and market testing.
Our focus in this post will be on the four main types of cluster analysis as applied in statistics, which are:
- Centroid clustering
- Density clustering
- Distribution clustering
- Connectivity clustering
Centroid clustering
Centroid clustering is perhaps the most basic clustering method yet one of the most powerful tools for grouping data points. The idea behind it is pretty simple-we represent an individual cluster with a central vector, then associate the data points within close proximity to the respective central vector (cluster).
With that, we can measure what distance exists between one cluster and its respective centroid by applying different measurement metrics such as Manhattan distance, Euclidian distance, and Minkowski distance.
Note that in a machine learning algorithm, you have to predefine the available clusters to form the basis of iteration, which perhaps a drawback. But even so, central clustering still beats other techniques dues to its simplicity and ease of interpretation.
Centroid clustering finds wide application in data science technologies such as image segmentation, market and customer segmentation, and document clustering.
Density clustering
As the name suggests, we group data points based on their distribution density. That means clusters are the regions with the highest number of data points within the data space, and they are separated from each other by areas with fewer data points (lower density).
Note that density represents the level of interrelationships between different clusters. Therefore, the higher the density of points, the higher the interrelationship, and vice versa.
Density clustering considers two assumptions:
- The data doesn՚t experience any noise
- Cluster shape is geometrical-circular/elliptical
Distribution clustering
Thus far, we have discussed techniques that cluster data based on proximity/distance and density. However, distribution clustering brings probability into the picture. What is the probability that the different data points belong to a given cluster? Therefore, the individual clusters we have are either for points that assume a binomial dirstribution, Gaussian distribution, etc.
Unlike centroid and density-based clustering techniques, distribution clustering experiences high flexibility, it is more accurate, and forms more functional cluster shapes. The only challenge is that this technique works best with simulated/synthetic data. Otherwise, the data overfits.
Connectivity/Hierarchical clustering
Connectivity clustering is a machine learning model that groups data into clusters that take a hierarchy format. That is achieved using two steps. First, it forms clusters using the top-down or bottom-up approach and then decomposes the hierarchy to form clusters.
Since it can begin with either top-down or bottom-up, the process can be accomplished using two approaches. The top-down method is referred to as the Divisive Approach, while the bottom-up method is called the Agglomerative Approach.
One key difference between the agglomerative approach and the divisive approach is that the former only considers local and neighboring data points while making decisions without necessarily considering the effect of the global distribution of data. On the contrary, the latter considers global data distribution as it iterates from top to bottom.
Essay Experts is Canada's premier essay writing and research service. We help undergraduate and graduate students with their essays, research papers, theses and dissertations. Our statisticians are standing by to help. Simply email us your question, requirements or assignment and we'll get back to you with a quote. Our statisticians all possess advanced degrees and have experience in helping students succeeed in statistical writing and analysis.