What are Gradient Boosted Trees?
Gradient boosted trees are an ensemble learning model that specifically uses decision trees and boosting to improve the model's results on a dataset. They typically have decision trees with performances that are not too strong—slightly better than chance.
A single decision tree whose results are "too good" may be overfitting the data. As a result, the ensemble model will exhibit worse performance. To avoid this in practice, machine learning experts simplify each decision tree: limiting its depth to prevent it from fitting too many features.
With gradient boosting, the dataset is treated explicitly as a numerical optimization problem to be solved with gradient descent.
Imagine that you’re somewhere in a mountainous landscape and you want to reach the lowest point as soon as possible. With so many bumps, hills, and depressions, it’s hard to know exactly which path will be the quickest. However, one reasonable way to start would be to look around you and move downward in the steepest direction.
This is the underlying concept of gradient descent: iteratively moving downward, stopping, and then reassessing your position in order to get a result that’s as small as possible.
To begin with, the first tree is trained to fit the dataset. Then, a loss function is generated from this weak model to understand the errors that it has made. The learner uses gradient descent to try to reduce errors by finding the local minimum of this loss function. Essentially, the learner moves “downward” in the direction where the loss decreases the fastest, correcting for the previous model's mistakes.
In practice, there are several implementations of the concept of gradient boosted trees that are used for machine learning problems. XGBoost has gained popularity in the last few years for its strong performance in machine learning competitions. The open-source CatBoost library is fairly similar to XGBoost, but has a few notable structural differences. Microsoft has also released its own framework LightGBM.
When to Use Gradient Boosted Trees
As we discussed in our article on neural networks, datasets that have no obvious representation such as images and audio files are best suited for deep learning. Gradient boosted trees, on the other hand, are good for problems where the most significant features of the dataset are already known in advance.
For example, data in spreadsheet format might have columns that represent features such as yearly profits, quantities, and locations. These are categories of information that humans have already decided to be important. This process of choosing the most relevant features is known as feature engineering.
Suppose that a manufacturing company wants to optimize the prices of a new item that it will release. The company can make use of its historical pricing data about other products: information such as price, location, quantity sold, and so on. This would be a strong use case for gradient boosted trees because the data is already in an amenable format.
Gradient boosted trees often hit the “sweet spot” for machine learning problems: they require a little more tweaking from experts than random forests, but need less data than a neural network. In particular, they have built-in mechanisms to prevent the problem of overfitting. If you’ve already extracted the most important features from your data and you know what kind of answers you’re looking for from the learner, then gradient boosted trees may be just the right choice.
The future is all about the data. Having it, and being able to draw conclusions from it. That means companies that successfully use predictive analytics and machine learning to inform business decisions have a competitive edge. If you want that edge, we can help. Get in touch.