# Exercise 05: Gaussian Process MPC

## 1. Introduction

Model mismatch is a frequent source of control inaccuracy in real-world applications of MPC. Since MPC is a optimization-based control method, MPCs will try to exploit the given model dynamics as much as possible, which may cause it to become instable with the real dynamics.

One research direction is to find a good approximation of the difference $d(x,u)$ between the nominal model $f_{\text{nominal}}(x,u)$ and the real model, i.e.,

$$ \dot{x} = f_{\text{nominal}}(x,u) + d(x,u),$$

This approximation can be achieved via learning-based methods, and Gaussian Processes (GPs) are one such method. 

GPs have the inherent advantage that they also provide uncertainty estimations which can be used to ensure stochastic constaint statisfaction and to improve the robustness of the MPC controller.

In this exercise, we focus on GP-MPC which models the residual dynamics with GPs and uses the uncertainty estimations to propagate the nominal constaints into the future. 

We then compare the performance between GP-MPC and nominal MPC when tracking a trajectory with a simulated drone under model mismatch.

### Imports

In [None]:
# Commented out because it can cause issues when changing the gpmpc parameters
# As a workaround we (re-)import the required modules at the beginning of each cell
# %load_ext autoreload
# %autoreload 2

from pathlib import Path

import crazyflow  # noqa: F401, required for registering environments
import gymnasium
import matplotlib.pyplot as plt
import numpy as np
import torch
from crazyflow.sim.symbolic import symbolic_attitude
from gpmpc import GPMPC
from gymnasium.wrappers.vector.jax_to_numpy import JaxToNumpy
from plotting import make_quad_plots
from run_gp_mpc import learn

global_seed = int(1)
torch.manual_seed(global_seed)

### Create the Environment and the Trajectory to Follow
First, we create an environment with a figure-eight trajectory in the x-y plane as the reference trajectory.

In [None]:
env = JaxToNumpy(gymnasium.make_vec("DroneFigureEightXY-v0", num_envs=1))
traj = env.unwrapped.trajectory.T
dt = 1 / env.unwrapped.freq
env.unwrapped.render_trajectory = False
# Extract the action space limits (the limits are hard, thus we add a small margin to prevent issues due to rounding and soft constraints in the MPC)
action_space = {
    "low": env.unwrapped.single_action_space.low + 0.01,
    "high": env.unwrapped.single_action_space.high - 0.01,
}

plt.figure(figsize=(5, 3))
plt.plot(traj[0, :], traj[2, :])
plt.xlabel("x [m]")
plt.ylabel("y [m]")
plt.title("Trajectory")


# 2. Implement the GP-MPC Controller
To approximate the model mismatch, we use GPs. In this section, we set up the nominal model and integrate it into the GP-MPC framework. For more details about GP Regression, we refer to exercise04.

<div class="alert alert-info">
    <h3>Task 1: Review the Nominal Dynamics Implementation and Complete the GP Implementation</h3>
    <p>
    Review the <code> symbolic_attitude </code> function and the <code> SymbolicModel </code> in the <a href="https://github.com/utiasDSL/crazyflow/blob/572fde19107dac07704a97d1fc37f17addf08201/crazyflow/sim/symbolic.py"><code>Crazyflow</code></a> package. (Task 1.1)
    </p>
    <p>
    Review the <code> gp.py </code> file and complete the <code> fit_gp </code> function. (Task 1.2)
</div>

In [None]:
## Define the model parameters
# The nominal model is based on a number of identified parameters (which do not match the actual parameters of the drone perfectly.)
# Since we want to show the effect of model inaccuracies, we are further disturbing the model parameters.
nominal_model_params = {
    "a": 10.0,
    "b": 1.7,
    "ra": -75.0,
    "rb": -10.0,
    "rc": 45.0,
    "pa": -75.0,
    "pb": -5.0,
    "pc": 35.0,
}
# Create the symbolic model
nominal_model = symbolic_attitude(dt=dt, params=nominal_model_params)

### Comment on Model Parameterization: 

Often, model identification does not identify single parameters but multiple. For instance, we may not identify the `mass` or `length` directly, but identify `mass * length` instead. In a pendulum system, for example, these two are always coupled in the ODE, so it is more efficient to identify coupled coefficients instead of identifying explicitly the inertial parameters.

We next create the GP-MPC config which includes the nominal model as prior parameters. 

### Setup the GP-MPC

<div class="alert alert-success">
    <h3>Exam Preparation</h3>
    <p>
    Review the role of Gaussian Processes in MPC. Discuss the advantages and limitations of using GPs for model learning.
    </p>
</div>

<div class="alert alert-info">
    <h3>Task 2: Complete the GP-MPC Implementation</h3>
    <p>
    Complete the <code> setup_prior_dynamics </code> function in <code>exercise05/gpmpc.py</code> (Task 2.1).
    </p>
    <p>
    Review the <code> GPMPC </code> class in <code>exercise05/gpmpc.py </code>. Then complete the following (Tasks 2.2 - 2.5):
    </p>
    <ul>
        <li> The <code> train_gp </code> function. </li> 
        <li> The <code> precompute_posterior_mean </code> function. </li> 
        <li> The <code> precompute_sparse_posterior_mean </code> function. </li>
        <li> The covariance propagation part in the <code> propagate_constraint_limits </code> function. Hint: Review the lecture notes on GP constraint propagation! </li>    
    </ul>
</div>

**Comment on sparse GP-MPC**: Full GP inference has a computational complexity of O(N³), where N is the number of training points. This becomes infeasible for large datasets. The Fully Independent Training Conditional (FITC) approximation is a method used to make Gaussian Process (GP) regression scalable for large datasets by reducing its computational complexity. FITC introduces a small set of inducing points, which are pseudo-inputs that summarize the information in the full dataset. Instead of directly modeling the full covariance matrix of the training data, FITC assumes conditional independence of the training points given the inducing points. This allows the covariance matrix to be approximated efficiently, reducing the computational cost of GP inference from $ \mathcal{O}(N^3) $ to $ \mathcal{O}(M^2N) $, where $ M $ is the number of inducing points $( M \ll N )$.

Due to its computational benefits we are using a spare GP-MPC implementation in this exercise.

In [None]:
# Directory to save the generated code, and result figures.
save_dir = Path(Path.cwd() / "saves")
# TODO Adjust the horizon, q_mpc, r_mpc, and prob parameters to see how they affect the performance of the controller.
gpmpc_config = {
    "horizon": 25,  # prediction horizon in steps (select 40 at most)
    "q_mpc": [8, 0.1, 8, 0.1, 8, 0.1, 0.5, 0.5, 0.5, 0.001, 0.001, 0.001],
    "r_mpc": [1, 1, 1, 0.1],
    "prior_params": nominal_model_params,
    "prob": 0.9,  # probability of the GP to be within the bounds, used for stochstical constraints satisfaction
    "device": "cpu",
    "sparse_gp": True,  # use FITC approximation
    "max_gp_samples": 30,  # max number of inducing points for the sparse GP
    "output_dir": save_dir,
    "seed": global_seed,
    "action_space": action_space,
}

ctrl = GPMPC(nominal_model, traj=traj, **gpmpc_config)

### Train the GP-MPC
Next, we train the GP-MPC using the previously defined environment. Note that this can take a couple of minutes.

<div class="alert alert-info">
    <h3>Task 3: Train the GP-MPC</h3>
    <p>
    Complete the GP-MPC training loop in the <code> learn() </code> function in <code> run_gp_mpc.py </code>. 
    </p>
    <p> 
    Experiment with different training hyperparameters to optimize the performance.
    </p>
</div>

In [None]:

# Training configuration (optimization is recommended as we deliberately chose non-optimal parameters)
train_config = {
    "n_epochs": 3,
    "gp_iterations": 500,  # max number of GP iterations per epoch
    "lr": 5e-3,  # learning rate for the GP optimizer
    "max_samples": 100,  # the maximum number of samples to use for training the GP
    "seed": global_seed,
    "use_validation": False,  # use a validation set for early stopping when training the GPs
}
# NOTE If acados reports an error: "reported status: 4", then the optimization problem becomes infeasible at some point. In this case, optimize the training and gpmpc parameters. (The distorted model parameters work for suitable training/gpmpc parameters.)
train_runs, test_runs = learn(ctrl=ctrl, env=env, **train_config)
# NOTE Training/Evaluation can take up to 30 minutes, depending on the training configuration and your hardware. Most of the time is spent on testing the GPMPC performance and collecting data. Reduce n_epochs, gp_iterations, and max_samples to speed up the training.

### Compare MPC and GP-MPC in Simulation

Next we compare the performance of MPC and GP-MPC (for various training progresses) by computing the MSE between reference trajectory and observed trajectories and by plotting the trajectories (for each dimension).

<div class="alert alert-info">
    <h3>Task 4: Compare the performance of MPC and GP-MPC</h3>
    <p>
    Plot the trajectories and compare the performance of the nominal MPC and GP-MPC. 
    Analyze the results and discuss the impact of the GP on the control performance.
    </p>
    <p>
    You can save pdf versions of the plots in "exercise05/saves/figs" when you disable the "show" flag in make_quad_plots.
    </p>
</div>

In [None]:
# NOTE The detailed pdf plots are saved under 'exercise05/saves/figs' or are shown when the argument "show" in the "make_quad_plots" function is True.
make_quad_plots(
    test_runs=test_runs,
    train_runs=train_runs,
    trajectory=ctrl.traj.T,
    save_dir=save_dir,
    show=False,
)

Compute the MSE between the reference trajectory and the MPC/GP-MPC trajectories

In [None]:
def compute_mse(ref, traj):
    max_len = min(len(ref), len(traj))
    return np.mean((ref[:max_len] - traj[:max_len]) ** 2, axis=(0, 1))


# Compute the MSE for the training and test runs
mse_dict = {}
for i, run in test_runs.items():
    mse_dict[i] = compute_mse(ctrl.traj.T, run["obs"])

# Print the MSE for the MPC and GP-MPC

for i, mse in mse_dict.items():
    if i == 0:
        print(f"MPC MSE: {mse}")
    else:
        # Print the MSE for GP-MPC runs
        print(f"GP-MPC MSE {i}: {mse}")

Plot the trajectories for the specified dimensions (standard: x-y plane)

In [None]:
num_steps = test_runs[0]["obs"].shape[0]
# trim the traj steps to mach the evaluation steps
traj = traj[0:num_steps, :]
num_epochs = len(test_runs)

num_points_per_epoch = []
num_epochs = len(test_runs)
fig = plt.figure(figsize=(10, 6))

# x-y plane
idx = [0, 2]
plt.plot(traj[idx[0], :], traj[idx[1], :], label="Reference", color="gray", linestyle="-")
plt.plot(test_runs[0]["obs"][:, idx[0]], test_runs[0]["obs"][:, idx[1]], label="Prior MPC")
for epoch in range(1, num_epochs):
    plt.plot(
        test_runs[epoch]["obs"][:, idx[0]],
        test_runs[epoch]["obs"][:, idx[1]],
        label="GP-MPC epoch %s" % epoch,
    )
plt.title("X-Y plane path")
plt.xlabel("X [m]")
plt.ylabel("Y [m]")
plt.legend()


<div class="alert alert-success">
    <h3>Exam Preparation</h3>
    <p>
    Review and understand the following notes on MPC and GP-MPC performance.
    </p>
</div>

### Discussion: Performance of MPC and GP-MPC Controllers

#### Why Doesn't MPC Track the Reference Well, Even with the Correct Model?

- **Control Constraints & Cost Function Tuning:**  
  The MPC’s tracking performance is fundamentally limited by actuator constraints (e.g., thrust, rate limits) and the weights in the cost function. If the cost function is not carefully tuned to prioritize tracking the most important states (such as x and y positions), or if input penalties are too high, the controller may avoid aggressive maneuvers and tolerate tracking errors. Hard constraints can also prevent the controller from following aggressive or sharp trajectory segments.

- **Reference Trajectory Feasibility:**  
  The reference trajectory itself may be physically infeasible or require actions that exceed the system’s capabilities. Even with a perfect model, the controller cannot track a trajectory that is not dynamically feasible given the system’s physical and actuation limits.

- **Model Structure, Underactuation, and Real-World Effects:**  
  Even with a perfect model, real-world factors such as unmodeled dynamics, sensor noise, disturbances, and discretization errors can degrade tracking. Underactuated systems (where not all degrees of freedom are directly controlled) further limit tracking performance.

- **Prediction Horizon Limitations:**  
  The finite prediction horizon of MPC means the controller can only optimize over a limited future window. A short horizon can limit the ability to anticipate and plan for future trajectory changes, especially for aggressive or rapidly changing references.

#### Why Doesn't GP-MPC Improve Dramatically Over MPC?

- **Marginal Gains After Initial Learning:**  
  GP-MPC often shows some improvement over nominal MPC after initial training, as the GP learns to correct systematic model errors. However, further improvements typically plateau. The GP can only correct for model errors it has seen in the training data, and its generalization is limited by the diversity and representativeness of that data.

- **Data Efficiency & Generalization:**  
  GPs require sufficient and representative data to accurately model the residual dynamics. If the training data does not cover the full range of operating conditions, or is too limited, the GP’s corrections will be incomplete or inaccurate.

- **Model Structure & Expressiveness:**  
  The GP’s input selection, kernel choice, and output structure may not be expressive enough to capture all relevant model errors. Over-regularization or poor hyperparameter choices can lead to underfitting.

- **Uncertainty Propagation & Constraint Tightening:**  
  GP-MPC propagates uncertainty through the dynamics and tightens constraints to ensure probabilistic safety. If the GP’s uncertainty estimates are inaccurate, or if the constraint tightening is too conservative, the controller may become overly cautious, reducing its ability to exploit the learned model and track the reference closely.

- **Computational and Practical Limitations:**  
  Sparse GP approximations (e.g., FITC) are used for tractability, but can reduce GP accuracy, especially for highly nonlinear or high-dimensional systems. Limited computational resources may restrict the number of inducing points or training iterations.

#### Key Takeaways

- **MPC performance is fundamentally limited by system constraints, cost function design, and model accuracy.** Even with a perfect model, perfect tracking is generally not possible due to these limitations.
- **GP-MPC can improve performance by learning and compensating for model errors,** but its effectiveness is limited by the quality and quantity of training data, the expressiveness of the GP model, and the conservatism introduced by uncertainty propagation.
- **Perfect tracking is rarely achievable in practice.** Both MPC and GP-MPC are subject to fundamental trade-offs between safety, robustness, performance, and computational complexity.
- **Further improvements may require richer data, better model structures, or hybrid learning/model-based approaches.**

# Reference 

[1] Hewing, L., Kabzan, J., & Zeilinger, M. N. (2020). Cautious Model Predictive Control Using Gaussian Process Regression. IEEE Transactions on Control Systems Technology, 28(6), 2736–2743.

[2] Wang, J., & Zhang, Y. (2024). A Tutorial on Gaussian Process Learning-based Model Predictive Control. arXiv preprint arXiv:2404.03689.

[3] A. Mesbah et al., "Fusion of Machine Learning and MPC under Uncertainty: What Advances Are on the Horizon?," 2022 American Control Conference (ACC), Atlanta, GA, USA, 2022, pp. 342-357.