# Understanding Itô Lemma 

First, let's set some basics. (from [wikipedia](https://en.wikipedia.org/wiki/It%C3%B4_calculus]))

An Itô process is a adapted stochastic process. The "adapted" term means that, if we have a sequence of realizations of the process $(X_1, X_2, ..., X_N)$, at every time $k$, $X_k$ is known (basically, we can always look at the price data in real time, for example).

In more math-y terms (not going full measure theory here, just a basic intro), what this means is that, in the probability space $(\Omega, \mathcal{F}, \mathbb{P})$ (where $\Omega$ is the universe of possibilities, $\mathcal{F}$ its $\sigma$-algebra containing all its subsets, and $\mathbb{P}$ a probability measure) with a filtration $\left\{\mathcal{F}_i\right\}$, $(X_i)$ is <u>adapted</u> to $(\mathcal{F}_i)$ if $X_i$ is $\mathcal{F}_i$-measurable.

Now that we know what adapted refers to, we can write a form for the Itô process as the sum of an integral with respect to a Brownian motion / Wiener process with an integral with respect to time:
\begin{equation*}
X_t = X_0 + \int_0^t\sigma_s dB_s + \int_0^t\mu_sds,
\end{equation*}
where $B$ is a Brownian motion and $\sigma$ is a predictable B-integrable process, and $\mu$ is predictable and Lebesgue integrable (Lebesgue integrals are kind of a generalization of your standard Riemann integrable but for measure theory).

Note 1: a $\sigma$-algebra is a collection $\mathcal{G}$ of subsets of $\Omega$ such that
- $\emptyset \in \mathcal{G}$
- $A \in \mathcal{G} \implies A^C \in \mathcal{G}$, where $A^C$ is the complement of $A$
- $A_1, A_2, \ldots \in \mathcal{G} \implies \bigcup_{k=1}^{\infty}A_k \in \mathcal{G}$

Note 2: a filtration is a sequence of $\sigma$-algebras such that each $\sigma$-algebra contains all the sets contained by the previous $\sigma$-algebra

Note 3: a random variable is basically just a function mapping $\Omega \rightarrow \mathbb{R}$. Let $X$ be such a variable. The $\sigma$-algebra $\sigma(X)$ generated by $X$ is the collection of all sets of the form $\{\omega \in \Omega | X(\omega)\in A\}$, where $A$ is a subset of $\mathbb{R}$. If $\mathcal{G}$ is a sub-$\sigma$-algebra of $\mathcal{F}$, we say $X$ is $\mathcal{G}$-measurable if every set in $\sigma(X)$ is also in $\mathcal{G}$.

Alternatively, and perhaps more intuitively, we can write the Itô process in its differential form:
\begin{equation}
dX_t = \mu_t dt + \sigma_t dB_t
\end{equation}

To me, this helps better understand the whole point of Itô's lemma, as I explain below. 

Consider a twice differentiable function $f(t,x)$. We can use a Taylor-series expansion of up to order 2 to express:
\begin{align*}
f(t+dt, x) - f(t,x) & = \frac{\partial f}{\partial t}dt + \frac{1}{2}\frac{\partial^2 f}{\partial t^2}(dt)^2 + \cdots \\
f(t, x+dx) - f(t,x) &= \frac{\partial f}{\partial x}dx + \frac{1}{2}\frac{\partial^2 f}{\partial x^2}(dx)^2 + \cdots
\end{align*}
The total derivative is basically the sum of the previous two terms: $df = f_t dt + f_x dx$. Considering $x$ to be the Itô process from above, we can plug in equation (1) to get
\begin{align*}
df &= \frac{\partial f}{\partial t}dt + \frac{1}{2}\frac{\partial^2 f}{\partial t^2}(dt)^2 + \frac{\partial f}{\partial x}\left[\mu_t dt + \sigma_t dB_t\right] + \frac{1}{2}\frac{\partial^2 f}{\partial x^2}\left[\mu_t^2 (dt)^2 + 2 \mu_t \sigma_t dt dB_t + \sigma_t^2 (dB_t)^2\right]\\
&= \left[\frac{\partial f}{\partial t} + \mu_t \frac{\partial f}{\partial x}\right]dt + \frac{\partial f}{\partial x}\sigma_t dB_t + \frac{1}{2}\left[\frac{\partial^2 f}{\partial t^2} + \frac{\partial^2 f}{\partial x^2}\mu_t^2\right](dt)^2 + \frac{1}{2}\frac{\partial^2 f}{\partial x^2}\left[2 \mu_t \sigma_t dt dB_t + \sigma_t^2 (dB_t)^2\right]
\end{align*}
Note, however, that, as $dt\rightarrow 0$, the terms $(dt)^2$ and $dtdB_t$ approach zero much faster, so we can ignore them (honestly, in my mind, I question: why did we take the second-order Taylor expansion with respect to time then? but oh well, not too critical for now). Additionally, and most importantly, since $B_t$ is a Brownian motion, then $(dB_t)^2 \sim dt$! Therefore, we can rewrite this to obtain the famous lemma:
\begin{align}
df &= \left[\frac{\partial f}{\partial t} + \mu_t \frac{\partial f}{\partial x} + \frac{\sigma_t^2}{2}\frac{\partial^2 f}{\partial x^2}\right]dt + \frac{\partial f}{\partial x}\sigma_t dB_t\\
&=\left[\frac{\partial f}{\partial t} + \frac{\sigma_t^2}{2}\frac{\partial^2 f}{\partial x^2}\right]dt + \frac{\partial f}{\partial x}dX_t
\end{align}