In the above tree, each node represents a feature (e.g., Outlook) in the dataset, and the differnt possible outcomes for each feature (e.g., Sunny, Overcast, Rain) lead to new branches in the tree. The final nodes are leaf nodes which represent the class (or target) labels.
Similar to the KNN classification algorithm, the decision tree also relies on creating decision boundaries to decide if a given point belongs to a particular class or not. However, unlike KNN, decision trees create explicit decision rules that can be stored and retrieved at a later point to predict the class of a new data point.
In the following workshop on CodeAcademy, you will learn how to explicity create such decision rules, and further, how to apply such algorithms using scikit-learn
.
Now, we're introduced to tree-based analyses, it's time ground down these fundamental by doing a workshop. Open the Basics of Machine Learning course on Codecademy and complete half of the modules: Decision Trees, specifically complete:
If you find yourself stuck in the project, please click here
Coming Datalab we will reflect on tree-based analysis again and give you an opportunity to ask any questions you might have.
Tomorrow we will cover Random Forests!