Suppose that you’re playing the game “20 Questions” with your friend. In case you need a quick refresher, your goal is to guess an unknown person, place, or thing, using 20 or fewer questions that can only have a “yes” or “no” answer.
It’s not hard to realize the optimal strategy for the game of 20 Questions. With so many objects that your friend is potentially thinking of, you need to reduce the set of answers as quickly as possible.
Each question that you ask should be the question that will give you the most information. Asking questions about specific answers (such as "Is it Harry Potter?” or “Is it my dog?”) is the worst strategy imaginable. Instead, the best players seek to cut the possibilities roughly in half every time that they ask a question, no matter whether your friend’s answer is “yes” or “no.”
If you understand the strategy behind 20 Questions, then you can also understand the basic idea behind the decision tree algorithm for machine learning. In this article, we’ll discuss everything you need to know to get started working with decision trees: how they work, the pros and cons of using them, and which situations they’re best suited for.
How Are Decision Trees Used?
Decision trees can be used for both classification and regression problems.
In classification, the model is trying to predict which class a given input belongs to, based on a certain set of features that describe it. For example, you could build a decision tree that can classify dog breeds, based on features such as the dog’s height, weight, fur color, and the curliness of their coat.
In regression, on the other hand, the model is trying to arrive at an exact number. Decision trees use a stepwise function that takes the average value of all of the nodes in the current “branch” of the tree. In order for decision tree regression to work well, the steps between the leaves of the tree should be fairly small in order to make the curve between them smoother.
How Do Decision Trees Work?
Imagine that you have a set of two-dimensional data points plotted on a graph, and you want to separate them into classes. At each step, you’re allowed to draw a line on the graph; some data points will fall on one side of the line, and the rest will fall on the other side. To accurately classify all of the data points, you will need to place each one in a box that only contains points belonging to the same class.
The decision tree algorithm works the same way: by recursively partitioning the set of input data. At each “branch” of the tree, the algorithm looks for the feature that will best split the set into two separate subsets. The algorithm terminates for a given branch when all of the data points within that branch are in the same class.
The two most popular decision tree metrics for calculating the “best” split are the Gini impurity and the information gain. The particular choice of splitting criteria will have an effect on both the tree’s structure and its performance.
Problems with Decision Trees
As with many machine learning algorithms, overfitting is a significant concern with decision trees. The more levels the tree has, the more likely it is that you’ll overfit.
In addition, decision trees can be biased in cases where the input dataset is dominated by a particular class. Without balancing your data beforehand, it’s very easy for the decision tree algorithm to fall victim to the dataset’s inherent biases.
Another problem is that decision trees have high variance, which means that they are very sensitive to small fluctuations in the input data. Minor variations can cause the decision tree algorithm to generate completely different trees, which can cause problems during training when you want to replicate certain results. Fortunately, the inherent variance of decision trees can be reduced using ensemble methods such as bagging and boosting.
When to Use Decision Trees
One of the greatest advantages of decision trees is that they can be drawn on a whiteboard and easily understood even by non-technical laypeople. This simply isn’t the case for many other machine learning algorithms. For example, neural networks have so many parameters that even people who use them consider them to be “black boxes.”
In addition to being easy to visualize and easy to understand, decision trees are cost-effective, work for both classification and regression, and don’t require as much training as other machine learning algorithms. Finally, decision trees are good at non-linear problems where the data isn’t able to fit within a nice, simple distribution.
However, issues such as overfitting make individual decision trees often a poor choice for more complicated projects. Rather than using a single decision tree, machine learning professionals often use ensemble models such as random forests and gradient boosted trees.
The future is all about the data. Having it, and being able to draw conclusions from it. That means companies that successfully use predictive analytics and machine learning to inform business decisions have a competitive edge. If you want that edge, we can help. Get in touch.