<h1><a href="https://arxiv.org/abs/1802.08678">
Verifying Controllers Against Adversarial Examples with Bayesian Optimization</a></h1>
Shromona Ghosh et al.

<h2>Summary</h2>

* Use boolean combinations of smooth functions on the trajectories as safety specification

* Use Baysesian Optimization to actively test the controller to find adversarial counterexamples

<h2>Motivation</h2>

* Conventional techniques in designing robust controllers rely on simple linear models of the underlying system.
    * Either overly conservative
    
    * Or inaccurate due to overapproximation, e.g. cannot capture nonlinear effects
    
* Reinforcement learning can generate high fidelity controllers
    * No formal guarantees for safety

* Formal safety certificates by using formal methods
    * Curse of dimensionality
    * Only use simple system dynamics
    
* Falsification tests the system in various environments seeking for adversarial examples
    * Perturbations must be meaningful for dynamic systems
* Test black-box systems by using smart search techinques
    * Sequential search algorithms based on heuristics, e.g. CMA-ES, Simulated Annearling, does not utilize information of previous simulations


<h2>Problem Formulation</h2>

* A closed-loop system uses a model parameterizing environment uncertainty with $w\in\mathbb{W}$

* If the system remain safe under all uncertain scenarios, then it satisfiies the safety specfication $\forall w\in\mathbb{W}, \varphi(w)>0$

* Test wether there is an adversarial example $w\in\mathbb{W}$ s.t. $\varphi(w)<0$ and minimize the test cost, e.g. number of simulations
$$argmin_{w\in\mathbb{W}} \varphi(w)$$

* Key problem is that functional dependence $\varphi(w)$ and $w$ is unknown.

<h2>Background</h2>

* Safety Specification
$$\varphi:= \mu|\neg\mu|\varphi_1\vee\varphi_2|\varphi_1\wedge\varphi_2$$
where $\mu:\Xi\rightarrow\mathbb{R}$ returns the 'robustness' of a trajectory $\xi\in\Xi$

$$\mu(\xi):=\mu(\xi), (\varphi_1\vee\varphi_2)(\xi):=min(\varphi_1, \varphi_2)),\\
  \neg\mu(\xi):=-\mu(\xi), (\varphi_1\wedge\varphi_2):=max(\varphi_1, \varphi_2)),$$
  
* Gaussian Process
    * $\hat{\\\mu}(w)$ is the observed target function $\mu(w)$ with Gaussian noise $\hat{\\\mu}(w)=\mu(w)+\omega$ where $\omega\sim\mathcal{N}(0,\sigma^2)$.
    * Given $n$ observation $y_n=(\hat{\\\mu}(w_1), \hat{\\\mu}(w_2),\ldots, \hat{\\\mu}(w_n))$ and their corresponding input $W_n=\{w_1, w_2,\ldots, w_n\}$, the posterior over function $\mu(w)$ has expectation $m_n(w)$, covariance $k_n(w,w) and variance $\sigma_n(w)$ where

\begin{eqnarray}
m_n(w)&=&k_n(w)(K_n+I_n\sigma^2)^{-1}y_n\\
k_n(w,w')&=&k(w,w')-k_n(w)(K_n+I_n\sigma^2)^{-1}k^T_n(w')\\
\sigma^2_n(w)&=&k_n(w,w')\\
\\
k_n(w)&=&[k(w,w_1),\ldots, k(w,w_n)]\\
[K_n](i,j)&=&k(w_i,w_j)
\end{eqnarray}


* Bayesian Optimization
    * The optimization function based on GP is $w_n=argmin_{w\in\mathbb{W}} m_{n-1}(w)-\beta^{1/2}_n\sigma_{n-1}(w)$


<h2>Approach</h2>


* `Parse Tree` $\mathcal{T}$: given a specification formula $\varphi$, the corresponding parse tree $\mathcal{T}$ has leaf nodes that corresponding to function predicates(atomics), while other nodes are max(disjunctions) and min(conjunctions) (and negation????). Use the quantified specification to represent the tree

\begin{eqnarray}
\varphi(w)&=&(\mu_1\wedge\mu_2)\vee\ldots\\
&\Rightarrow& max(min\ldots)\\
\mathcal{T}(\mu_1(w),\ldots, \mu_q(w))&=&\varphi(w) 
\end{eqnarray}

* The distribution of the tree is bounded with high probability by the lower-confidence interval of one of the predicates. Consider the **lower bounds** within the confidence intervals of each $\mu_i$ in the tree 
$$l_1=m^i_{n-1}(w)-\beta_n^{1/2}\sigma^i_{n-1}(w)$$

* Then a heuristic solution to BO optimization is to find a $w$ that maximally approaching the lower bounds of the confidence intervals of all the predicates. 
$$w=argmin_{w\in\mathbb{W}} \varphi(l_1(w),l_2(w),\ldots,l_q(w))$$
After $w$ is solved, add $w$ back to $W$ and start next propagation




<h2>Algorithm</h2>

The following example gives an example on how to solve the optimization problem $w=argmin_{w\in\mathbb{W}} \varphi(l_1(w), l_2(w), \ldots,l_q(w))$

* The experiment uses cartpole environment from openAI gym. The observations are continuous values while the action space is discretized. 

* The robustness of a trajectory is evaluated depending on the following 3 aspects:
    * Always stay within the region (-2.4, 2.4)
    * Maintain a momentum >=-2.0 and <= 2.0
    * The angle made by the cartpole should <=0.2 within the rest position

* Safety specification is that **the cartpole should always satisfy at least one of the listed conditions**. Therefore, there are 3 sub predicates evaluating the robustness of each the 3 conditions and the final predicate is the maximum of the 3 robustness values.

In [1]:
### In sub-predicate 1, evaluates the robustness of staying within range (-2.4, 2.4)
def pred1(traj):
    traj = traj[0]
    x_s = np.array(traj).T[0]
    return min(2.4 - np.abs(x_s))

### In sub-predicate 2, evaluate the robustness of maining momentum out of range (-2.0, 2.0)
def pred2(traj):
    traj_ = traj[0]
    mass = traj[1]['mass']
    v_s = np.array(traj_).T[1]
    return min(2. - np.abs(mass*v_s))

### In sub-predicate 3, evaluate the robustness of keeping pole angle within the range of (-0.2, 0.2) from vertical position
def pred3(traj):
    traj=traj[0]
    theta=np.array(traj).T[2]
    return min(0.2 - np.abs(theta))

### In final predicate, evaluate the maximum of the robustness values of the 3 sub-predicates
pred = lambda traj : np.amax([pred1(traj), pred2(traj), pred3(traj)])

* Given a policy, the algorithm randomly choose different sets of uncertainty parameters `X_ns` within the pre-defined ranges. 


In [None]:
bounds = [(-0.05, 0.05)] * 4 # Bounds on the state
bounds.append((0.05, 0.15)) # Bounds on the mass of the pole
bounds.append((0.4, 0.6)) # Bounds on the length of the pole
bounds.append((8.00, 12.00)) # Bounds on the force magnitude

* Given a controller, the agent can generate trajectories with the sets of uncertainty parameters. 

* The robustness values of each sub predicate including the final predicate in all trajectories (and corresponding uncertain parameters) are estimated and collected in `Y`.

* Then the algorithm builds a Gaussian Process regression model based on the uncertainty parameters `X_ns` and their corresponding robustness values `Y`.

In [None]:
import GPy

self.ns_GP = GPy.models.GPRegression(X_ns, Y,
                                        kernel=copy.deepcopy(self.kernel),
                                        normalizer=self.normalizer)
self.ns_GP.optimize_restarts(self.optimize_restarts)

* Then Bayesian Optimization iterations begin. 
    * In each iteration, the GP regression model is used to evaluate the mean `m` and variance `v` of the robustness value from the curret `X_ns`. 
    * The algorithm builds a function that uses GP regression model to calculate the lower bound of the confidence interval
$$l_1=m^i_{n-1}(w)-\beta_n^{1/2}\sigma^i_{n-1}(w)$$

* There are multiple optimization models to choose from. 
    * If the optimization method is `LBFGS`, then the gradients of the mean and variance are calculated. 
    * If the optimization method is `sample optimization`, then the gradients are not calculated.

In [None]:
### Bulid a function estimating the mean m, variance v and lower bound of the confidence interval m - self.k*np.sqrt(v)
def f(X_ns):
    ## Predict the mean and variance
    m,v = self.ns_GP.predict(X_ns)
    
    ## If using lbfgs, then predict the gradients of the mean, variance and f
    if isinstance(self.optimizer, lbfgs_opt):
        dm,dv = self.ns_GP.predictive_gradients(X_ns)
        dm = dm[:,:,0]
        df = dm - (self.k/2)*(dv/np.sqrt(v))
        
return m - self.k*np.sqrt(v), df

* The optimizer randomly chooses initial state and returns the estimated the uncertainty parameters that minimize the robustness value predicted by the Gaussian Process. 

In [None]:
## The returned f is the robustness value at optimal solution x
x,f = self.optimizer.optimize(f=lambda x: f(x)[0], df = lambda x:f(x)[1])
## For lbfgs, the optimal solution can be solved by using the gradient `df`.
## For sample optimization, the optimal solution is sampled.
## The code provieds some other algorithms besides those two.

* Given uncertainty parameters `x`, the agent generates trajectories correspondingly. The corresponding robustness values are evaluated and `X_ns` is updated. Gaussian Process regression model and $f$ are also updated.

In [None]:
self.ns_GP.set_XY(X_ns,np.vstack((self.ns_GP.Y, np.atleast_2d(f_x))))
self.ns_GP.optimize_restarts(self.optimize_restarts)