Gaussian Process

Gaussian Process (GP) is a Bayesian optimization method, like TPE. As introduced in the TPE section, both use Bayesian optimization but differ in how they model the relationship between parameters and objective values.

GP models p(y|x) directly — the distribution of objective values given parameter values.
TPE models p(x|y) and p(y) separately.

What is a Gaussian Process?

A Gaussian Process defines a probability distribution over functions. Given a set of observed trials, GP fits a surrogate model that predicts both the mean (expected value) and uncertainty (variance) at any unobserved point in the search space.

This is the key advantage over TPE: GP knows not only what value it expects at an unobserved point, but also how confident it is in that prediction.

How GP-based Bayesian Optimization Works

Optimization by GP is performed by repeatedly executing STEP2 to STEP3.

STEP1

Random sampling is first performed to gather initial observations, just like TPE. The number of initial trials is controlled by the "Number of startup trials" setting.

STEP2

Given the observed data, GP fits a surrogate model. For each candidate point x in the search space, the model provides:

A predicted mean μ(x): the expected objective value at x
A predicted variance σ²(x): the uncertainty of the prediction at x

STEP3

An acquisition function selects the next point to evaluate. Tunny uses Expected Improvement (EI), which balances:

Exploitation: searching near known good points (low μ(x))
Exploration: searching in uncertain regions (high σ(x))

The next trial is run at the point that maximizes EI.

GP Optuna vs GP BoTorch

Tunny provides two GP implementations:

	GP Optuna	GP BoTorch
Backend	Optuna native	BoTorch library
Multi-objective	Not supported	Supported
Constraints	Supported	Supported
Speed	Faster	Slower

If you are not doing multi-objective optimization, GP Optuna is recommended. GP BoTorch is required for multi-objective Bayesian optimization.

Time Complexity

GP requires computing the inverse of the kernel matrix at each step, which has time complexity O(n³) where n is the number of observations. This makes GP slower than TPE (O(dn log n)) as the number of trials grows. For a large number of trials, TPE is often more practical.

When to Use GP

Each trial is expensive and you want maximum sample efficiency
The objective function is smooth and continuous
The number of variables is relatively small (GP scales poorly with dimension)

GP