How Do Decision Trees Work?
Imagine that you have a set of two-dimensional data points plotted on a graph, and you want to separate them into classes. At each step, you’re allowed to draw a line on the graph; some data points will fall on one side of the line, and the rest will fall on the other side. To accurately classify all of the data points, you will need to place each one in a box that only contains points belonging to the same class.
The decision tree algorithm works the same way: by recursively partitioning the set of input data. At each “branch” of the tree, the algorithm looks for the feature that will best split the set into two separate subsets. The algorithm terminates for a given branch when all of the data points within that branch are in the same class.
The two most popular decision tree metrics for calculating the “best” split are the Gini impurity and the information gain. The particular choice of splitting criteria will have an effect on both the tree’s structure and its performance.
Problems with Decision Trees
As with many machine learning algorithms, overfitting is a significant concern with decision trees. The more levels the tree has, the more likely it is that you’ll overfit.
In addition, decision trees can be biased in cases where the input dataset is dominated by a particular class. Without balancing your data beforehand, it’s very easy for the decision tree algorithm to fall victim to the dataset’s inherent biases.
Another problem is that decision trees have high variance, which means that they are very sensitive to small fluctuations in the input data. Minor variations can cause the decision tree algorithm to generate completely different trees, which can cause problems during training when you want to replicate certain results. Fortunately, the inherent variance of decision trees can be reduced using ensemble methods such as bagging and boosting.
When to Use Decision Trees
One of the greatest advantages of decision trees is that they can be drawn on a whiteboard and easily understood even by non-technical laypeople. This simply isn’t the case for many other machine learning algorithms. For example, neural networks have so many parameters that even people who use them consider them to be “black boxes.”
In addition to being easy to visualize and easy to understand, decision trees are cost-effective, work for both classification and regression, and don’t require as much training as other machine learning algorithms. Finally, decision trees are good at non-linear problems where the data isn’t able to fit within a nice, simple distribution.
However, issues such as overfitting make individual decision trees often a poor choice for more complicated projects. Rather than using a single decision tree, machine learning professionals often use ensemble models such as random forests and gradient boosted trees.
The future is all about the data. Having it, and being able to draw conclusions from it. That means companies that successfully use predictive analytics and machine learning to inform business decisions have a competitive edge. If you want that edge, we can help. Get in touch.