In [2]:
%%javascript
MathJax.Hub.Config({
    TeX: { equationNumbers: { autoNumber: "AMS" } }
});

<IPython.core.display.Javascript object>

# Definition of subgaussianity

For a Gaussian random variable $X \sim \mathcal{N}(0, \sigma^2)$, its moment-generating function (MGF) is 

\begin{align*}
M_X(t) = \mathbb{E}\left[e^{t X} \right] = \exp \left( \frac{t^2 \sigma^2}{2} \right)
\end{align*}

See Appendix for a derivation.

Then we call a random variable $Y$ $\sigma$-subgaussian if for all $t \in \mathbb{R}$, 

\begin{align*}
M_Y(t) = \mathbb{E}\left[e^{t Y} \right] \le \exp \left( \frac{t^2 \sigma^2}{2} \right)
\end{align*}

In other words, $Y$ is subgaussian in the sense that its MGF is no bigger than that of $\mathcal{N}(0, \sigma^2)$. Note, $\mathcal{N}(0, \sigma^2)$ is $\sigma$-subguassian itself.

# Concentration analysis for a single $\sigma$-gaussian r.v.

First, let's bound the tails for a $\sigma$-subgaussian r.v. $X$ using Cramér-Chernoff method. For any $\epsilon > 0$.

\begin{align}
\mathbb{P}(X \ge \epsilon)
&= \mathbb{P}(\lambda X \ge \lambda \epsilon) \\
&= \mathbb{P}(e^{\lambda X} \ge e^{\lambda \epsilon}) \\
&\le \frac{\mathbb{E}[e^{\lambda X}]}{e^{\lambda \epsilon}} \\
&\le \exp\left(\frac{\lambda^2 \sigma^2}{2} - \lambda \epsilon \right)
\end{align}

Note,

* $\lambda > 0$.
* 3rd inequality is from the application of Markov's inequality.
* 4th inequality holds because $X$ is $\sigma$-subgaussian and $ \mathbb{E}\left[e^{\lambda X} \right] \le \exp \left( \frac{\lambda^2 \sigma^2}{2} \right)$

The RHS of the 4th inequality is minimized by $\lambda = \epsilon / \sigma^2$, so

\begin{align*} 
\mathbb{P}(X \ge \epsilon) 
&\le \exp\left(\frac{\epsilon^2}{\sigma^4} \frac{\sigma^2}{2} - \frac{\epsilon}{\sigma^2} \epsilon \right) \\
&= \exp \left( -\frac{\epsilon^2}{2 \sigma^2 } \right )
\end{align*}

Abound for the left tail can be obtain in a similar way.

\begin{align}
\mathbb{P}(X \le - \epsilon)
&= \mathbb{P}(- X \ge \epsilon) \\
&= \mathbb{P}(- \lambda X \ge  \lambda \epsilon) \\
&= \mathbb{P}(e^{- \lambda X} \ge e^{\lambda \epsilon}) \\
&\le \frac{\mathbb{E}[e^{- \lambda X}]}{e^{\lambda \epsilon}} \\
&\le \exp \left(\frac{\lambda^2 \sigma^2}{2} - \lambda \epsilon \right)
\end{align}

so $\mathbb{P}(X \le - \epsilon)$ has the same bound as $\mathbb{P}(X \ge - \epsilon)$.

Therefore, 

\begin{align*}
\mathbb{P}(X \ge \epsilon) &\le \exp \left( -\frac{\epsilon^2}{2 \sigma^2 } \right ) \\
\mathbb{P}(X \le - \epsilon) &\le \exp \left( -\frac{\epsilon^2}{2 \sigma^2 } \right )
\end{align*}

If we set $\delta = \exp \left( -\frac{\epsilon^2}{2 \sigma^2 } \right )$, then $\epsilon = \sqrt{2 \sigma^2 \ln 1/\delta}$.

Using union bound, we have

\begin{align*}
\mathbb{P}(X \ge \epsilon) &\le \delta \\
\mathbb{P}(X \le - \epsilon) &\le \delta  \\
\mathbb{P}(|X| \ge \epsilon) 
&= \mathbb{P} \left(|X| \ge \sqrt{2 \sigma^2 \ln 1/\delta} \right) \\
&\ge 2 \delta
\end{align*}

# Concentration analysis for mean of multiple $\sigma$-gaussian r.v.s

Using the Cramér-Chernoff method in the same way as in `Prove-regret-bounds-for-UCB-for-stochastic-Gaussian-bandits-with-variance-1.ipynb` and `Prove-regret-bounds-UCB-with-stochastic-Bernoulli-bandits.ipynb`, we obtain

\begin{align*}
\mathbb{P}(\hat{\mu} \ge \mu + \epsilon)
&le \frac{\prod_{t=1}^T \mathbb{E}\left[ e^{\lambda \left( X_t - \mu \right )} \right ]}{e^{\lambda T \epsilon}} \\
&\le \frac{\mathbb{E}\left[ \exp \left( \frac{\lambda^2 \sigma^2 }{2} \right ) \right ]}{\exp \left(e^{\lambda T \epsilon} \right )} \\
&=\exp \left( \frac{T \lambda^2\sigma^2}{2} - \lambda T \epsilon \right )
\end{align*}

Denote the RHS as $f(\lambda) = \exp \left( \frac{T \lambda^2\sigma^2}{2} - \lambda T \epsilon \right )$, then

$$
\min_{\lambda > 0} f = \exp \left( - \frac{T \epsilon^2}{2 \sigma^2} \right)
$$

where $\arg \min_{\lambda > 0} f = \epsilon / \sigma^2$.

Therefore,

\begin{align*}
\mathbb{P}(\hat{\mu} \ge \mu + \epsilon)
\le \exp \left( - \frac{T \epsilon^2}{2 \sigma^2} \right)
\end{align*}


Set RHS as $\delta = \exp \left( - \frac{T \epsilon^2}{2 \sigma^2} \right)$, then $\epsilon = \sqrt{\frac{2 \sigma^2 \ln 1 /\delta}{T}}$.

So 

\begin{align*}
\mathbb{P} \left(\hat{\mu} \ge \mu + \sqrt{\frac{2 \sigma^2 \ln 1 /\delta}{T}} \right)
\le \delta
\end{align*}


Following the same steps analyzing $\mathbb{P}(\hat{\mu} \le \mu - \epsilon) = \mathbb{P}(\mu -\hat{\mu} \ge \epsilon)$, and we can get the bound for the left tail,

\begin{align*}
\mathbb{P} \left(\hat{\mu} \ge \mu - \sqrt{\frac{2 \sigma^2 \ln 1 /\delta}{T}} \right)
\le \delta
\end{align*}


Note the two bounds are almost the same to those for the $\mathcal{N}(0, 1)$ just with an extra $\sigma^2$. They'd be identical for $\mathcal{N}(\mu, \sigma^2)$. 

# Properties of $\sigma$-subgaussian variables:

* $\mathbb{E}[X] = 0$, $\mathbb{V}[X] \le \sigma^2$.
* $cX$ is $|c|\sigma$-subgaussian for all $c \in \mathbb{R}$.
* $X_1 + X_2$ is $\sqrt{\sigma_1^2 + \sigma_2^2}$-subgaussian

These are also in Exercise 5.7 of the banditalgs book. See Appendix for proofs.

# Appendix

### Derive MGF for $\mathcal{N}(0, \sigma^2)$

\begin{align} 
\mathbb{E}[e^{\lambda X}]
&= \int \exp(\lambda x) \frac{1}{\sqrt{2\pi \sigma^2}} \exp \left( -\frac{x^2}{2\sigma^2}\right ) dx \\
&= \frac{1}{\sqrt{2\pi \sigma^2}} \int \exp\left[ -\frac{1}{2\sigma^2} \left(x^2 - 2\sigma^2 \lambda x \right ) \right ] dx \\
&= \frac{1}{\sqrt{2\pi \sigma^2}} \int \exp\left[ -\frac{1}{2\sigma^2} \left( \left(x - \sigma^2 \lambda \right )^2 - \sigma^4 \lambda^2 \right ) \right ] dx \\
&= \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left(\frac{\lambda^2 \sigma^2}{2} \right) \int \exp\left[ -\frac{1}{2\sigma^2} \left(x - \sigma^2 \lambda \right )^2  \right ] dx \\
&= \exp\left(\frac{\lambda^2 \sigma^2}{2} \right)
\end{align}

Note, in 5th equality, $\int \exp\left[ -\frac{1}{2\sigma^2} \left(x - \sigma^2 \lambda \right )^2  \right ] dx = \sqrt{2\pi\sigma^2}$ because it's the unnormalized distribution of $\mathcal{N}(\sigma^2 \lambda, \sigma^2)$.

A derivation of MGF for the more general $\mathcal{N}(\mu, \sigma^2)$ is available [here](https://zyxue.github.io/2021/08/29/gaussian-distributions.html).

### Properties of $\sigma$-subgaussian r.v.

The proofs is based on refs from 

* http://lear.inrialpes.fr/people/harchaoui/teaching/2013-2014/ensl/m2/lecture6.pdf (main one)
* https://www.stat.cmu.edu/~arinaldo/Teaching/36710/F18/Scribed_Lectures/Sep5.pdf
* https://ocw.mit.edu/courses/18-s997-high-dimensional-statistics-spring-2015/a69e2f53bb2eeb9464520f3027fc61e6_MIT18_S997S15_Chapter1.pdf

#### Show $\mathbb{E}[X] = 0$, $\mathbb{V}[X] \le \sigma^2$

Expand both sides of $\mathbb{E}\left[e^{t X} \right] \le \exp \left( \frac{t^2 \sigma^2}{2} \right)$ with Taylor series,

\begin{align}
\mathbb{E}\left[1 + tX + \frac{(tx)^2}{2} + \cdots \right ] 
\le 1 + \frac{t^2 \sigma^2}{2} + \frac{(t^2 \sigma^2 / 2)^2}{2} + \cdots
\\
t\mathbb{E}[X] + \frac{t^2}{2}\mathbb{E}[X^2] \le \frac{t^2 \sigma^2}{2} + g(t) \\
\end{align}

where $g(t)$ is a sum of all terms with a factor $t^k$ where $k \ge 3$.

To understand $\mathbb{E}[X]$, rearrange 2nd inequality,

\begin{align*}
t\mathbb{E}[X] \le \frac{t^2 \sigma^2}{2} - \frac{t^2}{2}\mathbb{E}[X^2]  + g(t) \\
\end{align*}


When $t > 0$,

\begin{align*}
\mathbb{E}[X] 
&\le \frac{t \sigma^2}{2} - \frac{t}{2}\mathbb{E}[X^2]  + \frac{g(t)}{t} \\
\lim_{t \rightarrow 0_+} \mathbb{E}[X]
&\le 0
\end{align*}

Note $\lim_{t \rightarrow 0} \frac{g(t)}{t} = 0$ because all terms of $\frac{g(t)}{t}$ contains a factor $t^k$ where $k \ge 2$.

When $t < 0$, similarly, we have

\begin{align*}
\mathbb{E}[X] 
&\ge \frac{t \sigma^2}{2} - \frac{t}{2}\mathbb{E}[X^2]  + \frac{g(t)}{t} \\
\lim_{t \rightarrow 0_-} \mathbb{E}[X]
&\ge 0
\end{align*}

When $t = 0$, the inequality $\mathbb{E}\left[e^{t X} \right] \le \exp \left( \frac{t^2 \sigma^2}{2} \right)$ obviously also holds.

Therefore, to have $\mathbb{E}\left[e^{t X} \right] \le \exp \left( \frac{t^2 \sigma^2}{2} \right)$ hold for all $t \in \mathbb{R}$, $\mathbb{E}[X] = 0$.

To understand $\mathbb{E}[X^2]$, rearrange the Taylor expanded inequality plugging in $\mathbb{E}[X] = 0$,

\begin{align*}
\frac{t^2}{2}\mathbb{E}[X^2] 
&\le \frac{t^2 \sigma^2}{2} + g(t) \\
\mathbb{E}[X^2] 
&\le \sigma^2 + \frac{2 g(t)}{t^2} \\
\end{align*}

$\lim_{t \rightarrow 0} \sigma^2 + \frac{2 g(t)}{t^2} = \sigma^2 $ because all terms of $g(t)$ have an order higher than 3 in $t$, so $\sigma^2$ is an upper bound for $\mathbb{E}[X^2]$, i.e. $\mathbb{V}[X] = \mathbb{E}[X^2] \le \sigma^2$.

QED.

#### Show $cX$ is $|c|\sigma$-subgaussian for all $c \in R$ if $X$ is $\sigma$-subgaussian

By definition of $X$ being $\sigma$-subgaussian for all $t \in \mathbb{R}$,

\begin{align*}
\mathbb{E}\left[e^{t X} \right] 
&\le \exp \left( \frac{t^2 \sigma^2}{2} \right)
\end{align*}

so it also holds for $h = ct \in \mathbb{R}$,

\begin{align*}
\mathbb{E}\left[e^{h X} \right] 
&\le \exp \left( \frac{h^2 \sigma^2}{2} \right) \\
\mathbb{E}\left[e^{ct X} \right] 
&\le \exp \left( \frac{(ct)^2 \sigma^2}{2} \right) \\
\mathbb{E}\left[e^{t (cX)} \right] 
&\le \exp \left( \frac{t^2 (c\sigma)^2}{2} \right) \\
\end{align*}

Therefore, $cX$ is $|c|\sigma$-subgaussian. QED.

#### Show $X_1 + X_2$ is $\sqrt{\sigma_1^2 + \sigma_2^2}$-subgaussian

By definition, we have

\begin{align*}
\mathbb{E}\left[e^{t X_1} \right] 
&\le \exp \left( \frac{t^2 \sigma_1^2}{2} \right) \\
\mathbb{E}\left[e^{t X_2} \right] 
&\le \exp \left( \frac{t^2 \sigma_2^2}{2} \right)
\end{align*}

so

\begin{align*}
\mathbb{E}\left[e^{t X_1} \right] + \mathbb{E}\left[e^{t X_2} \right]
&\le \exp \left( \frac{t^2 \sigma_1^2}{2} \right) + \exp \left( \frac{t^2 \sigma_2^2}{2} \right) \\
\mathbb{E}\left[e^{t (X_1 + X_2) } \right]
&\le \exp \left( \frac{t^2 \left( \sigma_1^2 + \sigma_2^2 \right )}{2} \right) \\
\end{align*}

Therefore, $X_1 + X_2$ is $\sqrt{\sigma_1^2 + \sigma_2^2}$-subgaussian, QED.