Tunny Icon
TunnyDocs

The next-gen Grasshopper optimization tool.

GP

Gaussian Process

Gaussian Process (GP) is a Bayesian optimization method, like TPE. As introduced in the TPE section, both use Bayesian optimization but differ in how they model the relationship between parameters and objective values.

  • GP models p(y|x) directly — the distribution of objective values given parameter values.
  • TPE models p(x|y) and p(y) separately.

What is a Gaussian Process?

A Gaussian Process defines a probability distribution over functions. Given a set of observed trials, GP fits a surrogate model that predicts both the mean (expected value) and uncertainty (variance) at any unobserved point in the search space.

This is the key advantage over TPE: GP knows not only what value it expects at an unobserved point, but also how confident it is in that prediction.

How GP-based Bayesian Optimization Works

Optimization by GP is performed by repeatedly executing STEP2 to STEP3.

STEP1

Random sampling is first performed to gather initial observations, just like TPE. The number of initial trials is controlled by the "Number of startup trials" setting.

STEP2

Given the observed data, GP fits a surrogate model. For each candidate point x in the search space, the model provides:

  • A predicted mean μ(x): the expected objective value at x
  • A predicted variance σ²(x): the uncertainty of the prediction at x

STEP3

An acquisition function selects the next point to evaluate. Tunny uses Expected Improvement (EI), which balances:

  • Exploitation: searching near known good points (low μ(x))
  • Exploration: searching in uncertain regions (high σ(x))

The next trial is run at the point that maximizes EI.

GP Optuna vs GP BoTorch

Tunny provides two GP implementations:

GP Optuna GP BoTorch
Backend Optuna native BoTorch library
Multi-objective Not supported Supported
Constraints Supported Supported
Speed Faster Slower

If you are not doing multi-objective optimization, GP Optuna is recommended. GP BoTorch is required for multi-objective Bayesian optimization.

Time Complexity

GP requires computing the inverse of the kernel matrix at each step, which has time complexity O(n³) where n is the number of observations. This makes GP slower than TPE (O(dn log n)) as the number of trials grows. For a large number of trials, TPE is often more practical.

When to Use GP

  • Each trial is expensive and you want maximum sample efficiency
  • The objective function is smooth and continuous
  • The number of variables is relatively small (GP scales poorly with dimension)

Reference