The goal of clustering algorithm is to automatically find stucture in the data, that allows us to infer groups of data points that are in some way similar, althought note the groups need not always correspond to real-life categories.
Examples of clustering applications are:
Please watch the following video to get an insight into clustering analyses.
As discussed in the above video, clustering is an example of Unsupervised Learning, as we do not provide the algorithm with explicit labels. The algorithm automatically looks for underlying features in the dataset which lead to clusters which are similar.
Now, we're introduced to Cluster analysis it's time ground down these fundamental by doing a workshop on the KMeans
algorithm. Open the Basics of Machine Learning course on Codecademy and complete the module: Clustering: K-Means, in particular:
You will have time to work on the Handwriting Recognition using K-Means project tomorrow in the data lab.
Coming Datalab we will reflect on K-means Clustering Analyses again and give you an opportunity to ask any questions you might have.
In datalab, we will apply K-means clustering on our problem statement for the Oosterhout dataset!