Anda di halaman 1dari 5

Clustering or unsupervised learning is a process of organizing

particular set of objects based on their characteristics, aggregating


them according to their similarities. It is basically a collection of
objects on the basis of similarity and dissimilarity between them.
The goal is to organize the objects into classes so that similar objects
are in one class.

Types of Clustering //not required


Broadly speaking, clustering can be divided into two subgroups :

 Hard Clustering: In hard clustering, each data point either belongs to a cluster
completely or not.
 Soft Clustering: In soft clustering, instead of putting each data point into a
separate cluster, a probability or likelihood of that data point to be in those clusters
is assigned.
Types of clustering algorithms //important
K-means Clustering Algorithm

K-means is one of the most popular clustering algorithm in which we use


the concept of partition procedure. The main idea is to define k centers,
one for each cluster. This is basically one of iterative clustering algorithm
in which the clusters are formed by the closeness of data points to
the centroid of clusters. Here , the cluster center i.e. centroid is formed
such that the distance of data points is minimum with the center. This
problem is basically one of NP- Hard problem and thus solutions are
commonly approximated over a number of trials.

The biggest problem with this algorithm is that we need to specify K in


advance. It also has problem in clustering density based distribution.
Fuzzy C-means (FCM) Algorithm

This algorithm works by assigning membership to each data point


corresponding to each cluster center on the basis
of distance between the cluster center and the data point. More the
data is near to the cluster center more is its membership towards the
particular cluster center. Therefore, the data point does not have an
absolute membership over a particular cluster. This is the reason the
algorithm is named ‘fuzzy’.
Expectation-Maximisation (EM) Algorithm
It is a clustering model in which we will fit the data on the probability that how
it may belong to the same distribution. The grouping done may be normal or
gaussian . Gaussian distribution is more prominent where we have fixed
number of distributions and all the upcoming data is fitted into it such that the
distribution of data may get maximized . This result in grouping which is shown
in figure:-

This model works good on synthetic data and diversely sized clusters. But this
model may have problem if the constraints are not used to limit model’s
complexity.
Hierarchical Clustering Algorithms
Last but not the least are the hierarchical clustering algorithms. These
algorithms have clusters sorted in an order based on the hierarchy in data
similarity observations. Hierarchical clustering is categorised into two types,
divisive(top-down) clustering and agglomerative (bottom-up) clustering. The
former type groups all data points/observations in a single cluster and divides
it into two clusters on least similarity between them, while the latter type
assigns every data point as a cluster itself and aggregates the most similar
clusters. This basically means bringing the right data together.

Hierarchical clustering depiction (Image credits: Dr Saed Sayad)


Most of the hierarchical algorithms such as single linkage, complete
linkage, median linkage, Ward’s method, among others, follow the
agglomerative approach. (More information on hierarchical clustering can be
found here).

Anda mungkin juga menyukai