# Bayesian Optimization Framework for Muscle Parameters

## Definitions

Let:

- *$\theta$*: the parameters of the simulator (e.g., damping, force scaling)  
- *$q(\theta)$*: the variational distribution (e.g., $\mathcal{N}(\mu, \sigma^2)$) over parameters — what we learn  
- *$\mathcal{D}$*: observed data — in this case, a target force distribution $p_{\text{target}}(f)$  
- *$p(f \mid \theta)$*: distribution of simulator outputs (e.g., muscle forces) given parameters $\theta$  
- *$p(\theta)$*: prior distribution over parameters  
- *$p(f)$*: marginal distribution of forces (not needed for optimization)

---

## Bayes' Rule

We are interested in computing the **posterior** over muscle model parameters $\theta$ given observed data $\mathcal{D}$ — in this case, a target output force distribution:

$$
\underbrace{p(\theta \mid \mathcal{D})}_{\text{Posterior}} = 
\frac{
\underbrace{p(\mathcal{D} \mid \theta)}_{\text{Likelihood}} \cdot 
\underbrace{p(\theta)}_{\text{Prior}}
}{
\underbrace{p(\mathcal{D})}_{\text{Evidence}}
}
$$

---

### Definitions

- **Posterior** ($p(\theta \mid \mathcal{D})$):  
  The probability distribution over parameters $\theta$ *after* observing the data. This is what we want to infer.

- **Likelihood** ($p(\mathcal{D} \mid \theta)$):  
  How likely the observed data $\mathcal{D}$ would be if the simulator had parameters $\theta$. In our context, how well the simulated force outputs match the target force distribution.

- **Prior** ($p(\theta)$):  
  Our initial belief or uncertainty about the parameters *before* seeing the data. Often chosen to be uninformative (e.g., a standard Gaussian), or based on known physiology.

- **Evidence** ($p(\mathcal{D})$):  
  The marginal probability of the data — often treated as a normalization constant. It ensures that the posterior is a valid probability distribution.

---

### In Our Case:

We define:

$$
\mathcal{D} = p_{\text{target}}(f)
$$

This means: our observed data is a *target force distribution* — for instance, a Normal distribution centered at 0.2 N with some known variance, based on empirical muscle measurements.

Then:

- $p(\mathcal{D} \mid \theta)$ means:  
  *How likely is it that $\theta$ would generate a simulated force distribution matching the observed one?*

This term forms the core of the **distributional loss** we use during optimization (e.g., via KL divergence between simulated and target force distributions).


---

## Optimization Objective via Variational Inference

We approximate $p(\theta \mid \mathcal{D})$ with a variational distribution $q(\theta)$, and minimize the KL divergence:

$$
\min_{q(\theta)} \; \text{KL}(q(\theta) \, \| \, p(\theta \mid \mathcal{D}))
$$

This is equivalent to minimizing the **Evidence Lower Bound (ELBO)**:

$$
\mathcal{L}_{\text{VI}} = \mathbb{E}_{\theta \sim q(\theta)} \left[ -\log p(\mathcal{D} \mid \theta) \right] + \text{KL}(q(\theta) \, \| \, p(\theta))
$$

> **Interpretation:** The ELBO quantifies how well a distribution over muscle parameters $q(\theta)$ can explain the target force distribution $\mathcal{D} = p_{\text{target}}(f)$ while staying close to the prior $p(\theta)$ — effectively scoring the probability that sampled parameters can plausibly generate the observed muscle behavior.


---

## Expanding the Terms

### 1. Likelihood Term (Distributional Loss)

This measures how well a sampled $\theta$ explains the observed force distribution.

Assume:

- $f_\theta^{(1)}, \dots, f_\theta^{(T)}$: sampled forces from simulation  
- Empirical output distribution:  
  $$
  p_\theta(f) = \mathcal{N}(\mu_\theta, \sigma_\theta^2)
  $$
- Target distribution:  
  $$
  p_{\text{target}}(f) = \mathcal{N}(\mu_t, \sigma_t^2)
  $$

Then the negative log-likelihood becomes:

$$
-\log p(\mathcal{D} \mid \theta) \approx \text{KL}(\mathcal{N}(\mu_\theta, \sigma_\theta^2) \, \| \, \mathcal{N}(\mu_t, \sigma_t^2))
$$

So the **expected distributional loss** is:

$$
\mathcal{L}_{\text{likelihood}} = \mathbb{E}_{\theta \sim q(\theta)} \left[ \text{KL}(p_\theta(f) \, \| \, p_{\text{target}}(f)) \right]
$$

---

### 2. Prior Term (Regularization)

If:

$$
q(\theta) = \mathcal{N}(\mu_q, \sigma_q^2), \quad p(\theta) = \mathcal{N}(\mu_0, \sigma_0^2)
$$

Then the KL divergence is analytic:

$$
\text{KL}(q(\theta) \, \| \, p(\theta)) =
\log \left( \frac{\sigma_0}{\sigma_q} \right) +
\frac{\sigma_q^2 + (\mu_q - \mu_0)^2}{2\sigma_0^2} - \frac{1}{2}
$$

---

## Final Optimization Objective

$$
\min_{\mu_q, \sigma_q} \; 
\underbrace{
\mathbb{E}_{\theta \sim q(\theta)} \left[ \text{KL}(p_\theta(f) \, \| \, p_{\text{target}}(f)) \right]
}_{\text{distributional output loss}} 
+
\underbrace{
\text{KL}(q(\theta) \, \| \, p(\theta))
}_{\text{parameter regularization}}
$$

---

## What This Means

You're no longer trying to find the **single best** set of parameters $\theta^*$.  
Instead, you're finding a **distribution over parameters** $q(\theta)$ such that:

- Samples $\theta \sim q(\theta)$ generate force distributions  
- Those force distributions match your **target** biomechanics data on average

The optimizer adjusts $\mu_q$ and $\sigma_q$ to minimize this mismatch.
