GOSS and LightGBM

Published on 2026-05-28

Introduction

LightGBM is a FOSS gradient boosting framework for ML developed by Microsoft. It uses the Gradient-Based One-Side Sampling (GOSS) mechanism.

GOSS

GOSS is an sampling method that first, sorts the training data by the gradients of the loss function with respect to the current model, and then selects a subset of the data based on the magnitude of these gradients. In regular Gradient Boosting, a model is trained by iteratively adding weak learners to the model, with each new learner being trained on the residual errors of the previous learners. This process continues until the model reaches a pre-defined stopping criteria. On the other hand, GOSS selects a subset of training data based on the gradients of the loss fn, with respect to the current model. It has 2 steps:

For each data instance, the algorithm computes its gradient, and adds it to a sorted list. This is then divided into 2 parts: The top k gradients, and the bottom n-k gradients.
For the large gradients, the algorithm includes all of the corresponding data instances in the subset of the data instances to consider for the split points. For the small gradients, the algorithm randomly samples a fixed number of data instances to include in the subset. The number of data instances to sample are determined by a bagging function.

The idea is to sample instances to account for the gradient of the loss fn with respect to the predictions made by the model.

Why LightGBM?

While most boosting libraries grow trees level-by-level, LightGBM grows it leaf-wise. It splits the single leaf that will reduce the most. This produces deeper, more asymmetric trees with fewer splits.

In sparse data, many features are rarely non-zero at the same time. LightGBM bundles such mutually exclusive features into a single feature, reduncing dimensionality without losing information.

GOSS allows it to sort all samples by their gradient, randomly sample the small gradients, and upweight the small gradient examples to correct for the sampling bias. This helps prevent overfitting.

LightGBM is particularly useful in ML hackathons as it trains much faster than XGBoost/Random Forest, does not require feature scaling/normalization, and can handle missing values natively.