# Definitions

## Markov Process

A *Markov Process* consists of :

1. A countable set of states $\mathcal{S}$ (known as the State Space) and a set $\mathcal{T} \subset \mathcal{S}$ (known as the set of Terminal States
2. A time-indexed sequence of random states $S_t \in \mathcal{S}$ for all time steps, each satisfying Markov Property
3. *Termination*: If an outcome for $S_T$ (for some time step T) is a state in the set $\mathcal{T}$, then this sequence outcome terminates at time step T.

We refer to $\mathbb{P}[S_{t+1}|S_t]$ as the transition probabilities for time t

## Stationary Markov Process
A *Stationary Markov Process* is a MP with the additional property that $\mathbb{P}[S_{t+1}|S_t]$ is independent of $t$.

This means, the dynamics of a Stationary Markov Process can be fully specified with the function
\begin{equation}
\mathcal{P}:(\mathcal{S}-\mathcal{T}) \times \mathcal{S} \rightarrow [0,1]
\end{equation}
such that $\mathcal{P}(s0,s1)=\mathbb{P}[S_{t+1}=s1|S_t=s0]$

## Finite Markov Process
MP with finite state spaces. Note that we now can express our transition in a fixed matrix(fixed num rows, and cols).
The matrix if often sparse, we can represent it as a dictionary of dictionary
\begin{equation}
\mathcal{P} :\mathcal{N}\times \mathcal{S}\rightarrow [0,1] \\
\mathcal{N} \rightarrow (\mathcal{S} \rightarrow [0,1])
\end{equation}
Equivalently. `Transition = Mapping[S, Optional[FiniteDistribution[S]]]`

## Stationary Distribution of MP
The *Stationary Distribution* of a (Stationary) Markov Process with state space $\mathcal{S}=\mathcal{N}$ and transition probability function $\mathcal{P}:\mathcal{N}\times \mathcal{N} \rightarrow [0,1]$ is a probability distribution function $\pi: \mathcal{N}\rightarrow [0,1]$ s.t.
\begin{equation}
\pi (s) = \sum_{s'\in \mathcal{N}} \pi(s) \cdot \mathcal{P}(s',s) \text{for all $s\in\mathcal{N}$}
\end{equation}
The intuitive meaning is, in the long run, if there's no terminal state, the probabilities of occurance at that state.
\begin{align}
& \pi^T = \pi^T \cdot \mathcal{P} \\
\text{equivalently, } & \mathcal{P}^T\cdot \pi = \pi
\end{align}
We notice that $\pi$ is an eigenvector of $\mathcal{P}^T$ with eigenvalue 1

## Markov Reward Process
A *Markov Reward Process* is a MP, along with time-indexed sequence of *Reward* random variables $R_t \in\mathbb{R}$ for time steps $t=1,2,...$ satisfying the Markov Property(including Rewards) i.e. $(R_{t},S_t)$ satisfies the MP
\begin{equation}
\mathcal{P}_R :\mathcal{N}\times \mathbb{R} \times \mathcal{S}\rightarrow [0,1] \\
\sum_{s'\in\mathcal{S}}\sum_{r\in\mathbb{R}}\mathcal{P}_R(s,r,s')=1 \quad \text{for all $s\in\mathcal{N}$}
\end{equation}

With $\mathcal{P}_R$ completely derived, we can calculate the following two functions
* we can calculate $\mathcal{P} :\mathcal{N}\times \mathcal{S}\rightarrow [0,1]$.
\begin{equation}
\mathcal{P}(s,s') = \sum_{r\in\mathbb{R}} \mathcal{P}_R(s,r,s')
\end{equation}
* The reward transition function
\begin{equation}
\mathcal{R}_T: \mathcal{N} \times \mathcal{S} \rightarrow \mathbb{R}\\
\mathcal{R}_T(s,s') = \mathbb{E}[R_{t+1}|S_{t+1}=s',S_t=s] = \sum_{r\in\mathbb{R}} \frac{\mathcal{P}_R(s,r,s')}{\mathcal{P}(s,s')} r
\end{equation}

* Reward function
\begin{equation}
\mathcal{R}:\mathcal{N}\rightarrow \mathbb{R}\\
\mathcal{R}(s)=\mathbb{E}(R_{t+1}|S_t=s)=\sum_{s'\in\mathbb{S}} \mathcal{P}(s,s')\cdot \mathcal{R}_T(s,s') = \sum_{s'\in\mathbb{S}} \sum_{r\in\mathbb{R}} \mathcal{P}_R(s,r,s') \cdot r
\end{equation}

## Value function of MRP
* Return 
\begin{equation}
G_t = \sum_{i=t+1}^\infty \gamma^{i-t-1} \cdot R_i = R_{t+1} + \gamma \cdot R_{t+2} + \gamma^2 R_{t+3} + \dots
\end{equation}
* Value function
\begin{equation}
V : \mathcal{N}\rightarrow \mathbb{R}\\
V(s) = \mathbb{E}[G_t|S_t=s] \quad \text{for all $s\in\mathcal{N}$, for all $t=0,1,2,\dots$} \\
= \mathcal{R}(S) + \gamma \cdot \sum_{s'\in\mathcal{N}}\mathcal{P}(s,s')\cdot V(s') \text{for all $s \in \mathcal{N}$}
\end{equation}

* Matrix formulation, V = (m*1), P = (m X m) R = (m X 1)
\begin{equation}
\mathbf{V} = \mathbf{\mathcal{R}} + \gamma\mathbf{\mathcal{P}}\cdot\mathbf{V} \\
\mathbf{V} = (\mathbf{I}_m - \gamma \mathbf{\mathcal{P}})^{-1}\cdot \mathbf{\mathcal{R}}
\end{equation}

# Glossary

1. $\mathcal{S}$: Set of State Space
2. $\mathcal{T}$: Set of Terminal States
3. $\mathcal{N}$: Set of Non-Terminal States. $\mathcal{N} = \mathcal{S}-\mathcal{T}$
4. $\mathcal{P}$: Transition probability function of SMP(indepedent of $t$)

# G2
* Episodic/continuing: episodic are MP that ends,i.e. SnL, Continuing is like prices of stock, fixed cut off.