# <center>CS 207 Final Project: Milestone 2</center>
<center>November 2019</center>

## Introduction


Efficiently and accurately evaluating derivatives of functions is one of the most important operations in science and engineering. Automatic differentiation (AD) is a technique which, given a function $f$ and a point, automatically evaluates that point's derivative. AD is less costly than symbolic defferentiation, while achieving machine precision compared with finite differentiation. This library implements the forward mode of AD, along with some additional features.


## Background

We present below some of the key concepts and formulae upon which we build the veritorch library:

### Chain rule

Chain rule is fundamental to AD when we decompose functions.

Suppose we have $h(u(t), v(t))$, its derivative with respect to $t$ is:

$$\frac{\partial h}{\partial t} = \frac{\partial h}{\partial u}\frac{\partial u}{\partial t} + \frac{\partial h}{\partial v}\frac{\partial v}{\partial t}.$$

For the general function $h(y_1(\mathbf{x}), \dotsc,y_n(\mathbf{x}))$, where we replace $t$ with a vector $\mathbf{x} \in \mathbb{R}^m$ and $h$ a function of $n$ other functions $y_i$, the derivative is:

$$\nabla_x h = \sum_{i=1}^n \frac{\partial h}{\partial y_i} \nabla y_i(\mathbf{x})$$

### Jacobians and vectors

If we have a function $\mathbf{y}(\mathbf{x})$: $\mathbb{R}^n \rightarrow \mathbb{R}^m$, the Jacobian matrix of it is a matrix representing all the possible partial derivatives combinations as follows:
$$
\mathbf{J} = \begin{bmatrix}
\frac{\partial \mathbf{y}}{\partial x_1} & \dots  & \frac{\partial \mathbf{y}}{\partial x_n}
\end{bmatrix} = \begin{bmatrix} 
\frac{\partial {y_1}}{\partial x_1} & \dots & \frac{\partial {y_1}}{\partial x_n} \\
\vdots & \ddots & \vdots\\
\frac{\partial {y_m}}{\partial x_1} & \dots & \frac{\partial {y_m}}{\partial x_n} \\
\end{bmatrix}
$$
In general, for example, we have a function $g(\mathbf{y}(\mathbf{x}))$. Suppose a vector $\mathbf{v}$ happens to be the gradient of g with respect the vector $\mathbf{y}$ as follows: 
$$
\mathbf{v} = \begin{bmatrix}
\frac{\partial g}{\partial y_1} & \dots & \frac{\partial g}{\partial y_m}
\end{bmatrix} ^T
$$
To get the gradient of g with respect to $\mathbf{x}$, we multiply Jacobian matrix $\mathbf{J}$ with vector $\mathbf{v}$: 

$$\mathbf{J} \cdot \mathbf{v} = \begin{bmatrix} 
\frac{\partial {y_1}}{\partial x_1} & \dots & \frac{\partial {y_m}}{\partial x_n} \\
\vdots & \ddots & \vdots\\
\frac{\partial {y_1}}{\partial x_1} & \dots & \frac{\partial {y_m}}{\partial x_n} \\
\end{bmatrix} 
\begin{bmatrix}
\frac{\partial g}{\partial y_1} \\
\vdots \\ 
\frac{\partial g}{\partial y_m}
\end{bmatrix} = \begin{bmatrix}
\frac{\partial g}{\partial x_1} \\
\vdots \\ 
\frac{\partial g}{\partial x_n}
\end{bmatrix} $$

### Computational graphs 

AD exploits the idea that complicated equations could be converted into a sequence of elementary operations which have specified routines for computing derivatives. This process, also called the evaluation trace, can be visualized by a computational graph where each step is an elementary operation. For example, we want to evaluate the derivative of the function: $$f(x) = 5\exp(x^2)+\sin(3x)$$
Here in this example, the right-most $x_7$ represents the value of $f(x)$, while the left-most $x$ represents our input variable. We construct a computational graph where we take the input $x$ as a node, and we take the constants as nodes as well when applicable. These nodes are connected by lines (edges) to represent the flow of information. 

![](https://i.imgur.com/uBUpnfc.jpg=300x)



### Elementary functions

Elementary Functions | Example | Derivative
:-:|:-:|:-:
exponentials| $e^x$ | $e^x$   
logarithms|$log(x)$| $\frac{1}{x}$ 
powers| $x^2$| $2x$ 
trigonometrics| $sin(x)$ | $cos(x)$ 
inverse trigonometrics|  $arcsin(x)$ | $\frac{1}{\sqrt{(1-x^2)}}$ 


### Dual number
$\forall z_1=x_1+y_1\epsilon, z_2=x_2+y_2\epsilon,$ where $x_1, y_1, x_2, y_2\in \mathbb{R}$, we have the following properties for dual number:
1. $z_1+z_2=(x_1+x_2)+(y_1+y_2)\epsilon$
2. $z_1z_2=(x_1x_2)+(x_1y_2+x_2y_1)\epsilon$
3. $z_1/z_2=(\frac{x_1}{x_2})+\frac{x_2y_1-x_1y_2}{x_2^2}$

As can be seen from the equations above, there is a close connection between the multiplication/division of dual numbers and the product/quotient rules for derivatives:
$$(f(x)g(x))'=f'(x)g(x)+f(x)g'(x)$$
$$(\frac{f(x)}{g(x)})'=\frac{f'(x)g(x)-f(x)g'(x)}{g^{2}(x)}$$.

### Forward mode
At the moment, our veritorch package support forward mode only, which uses chain rule to propogate the gradient with respect to the input/independent variables along the computational graph. In the process of gradient propogation, we use the class named Variable that can be updated in a fashion similar to the dual number formula listed above to store the intermediate values and derivatives. In the end, our package will return the jacobian vector product $Jp$. As a result, the forward mode depends on the number of independent parameters involved in the function $f$. If a function actually involves many independent parameters, the forward mode might not be efficient enough and the backward mode should be considered instead. However, due to time constraint, we might not be able to add support of backward mode to our veritorch package.

## How to use veritorch

To use the veritorch package, users should first run the commands provided below to install our package via pip and import it. We have already uploaded our veritorch package to PyPI.

```
pip install veritorch
python
>>>import veritorch.veritorch as vt
```

After successfully installing and importing the veritorch package, users can take the following steps to evaluate the derivative of $f$ at a point $x$. Here we take $f(x)=f(x_1,x_2,x_3)=x_1x_2x_3,~x=(x_1,x_2,x_3)=(4,5,6)$ as an example:

```
# First, create an instance of solver class in the veritorch package that
# tracks how many of independent variables $f$ takes as input.

>>>sol=vt.Solver(3)

# Next, use the method create_variable(initial_value) of solver class
# to create variable x1, x2, x3 with their values initialized to 4, 5, 6 
# (and partial derivatives, with respect to x1, x2, x3 respectively 
# initialized to 1 by default)

>>>x1=sol.create_variable(4)
>>>x2=sol.create_variable(5)
>>>x3=sol.create_variable(6)
>>>f1=x1*x2*x3
>>>print(f1)
Variable(120, [30,24,20])
```


The veritorch package will also support composite functions that involve elementary functions, including but not limited to $\sin(x), \cos(x), \exp(x), \arcsin(x)$:

```
# create variable x1 with its value initialized to pi 
# (and derivatives initialized to 1 by default)

>>>import math
>>>import numpy as np
>>>sol=vt.Solver(1)
>>>x1=sol.create_variable(math.pi)
>>>f1=np.sin(x1)
>>>f2=np.cos(f1)
>>>f3=np.exp(f2)
>>>f4=np.arcsin(f3)

# user can also put in a composite function all at once. demo omitted.
```

We also plan to support the following demo, though it is not supported in this milestone.

```
# for multi-dimensional function, user can use solver.merge method 
# to get the jacobian matrix

>>>sol=vt.Solver(2)
>>>x1=sol.create_variable(1)
>>>x2=sol.create_variable(2)
>>>f1=x1*x2
>>>f2=x1**x2
>>>f3=x1+2*x2
>>>print(sol.merge(f1, f2, f3))
Variable([1,1,5], [[2,1],[2,0],[1,2]])
```
Then typing "print(f4)" will show us the final value and derivative of the composite function.

## Software organization

### Directory structure

We plan to make the final repo follow the directory structure below:
```
cs207-FinalProject/
    README.md
    requirements.txt
    LICENSE
    setup.py
    veritorch/
        __init__.py
        veritorch.py
        ...
    test/
        test.py
        ...
    docs/
        milestone1.ipynb
        milestone2.ipynb
        ...
```

### Modules
We choose to have only one module published: the veritorch module within our veritorch package. The veritorch module includes a Solver class and a Variable class. The Solver class takes as an input the number of independent variables that the function $f$ has, and tracks how many independent variables we have already created so far. The Variable class takes as input the initial value $x$, and includes methods to overload basic arithmic operators for the veritorch package. While we have a test module, our test module will not be included in the final published package. 

### Testing
All files related to testing will be put in the directory cs207-FinalProject/test/. We will use TravisCI to check whether each pull request can actually build and Codecov to check how many lines of code have been tested. We have activated both of them for this repo and included the badges in the README.md file at the root directory.

### Distrubution of the package
We will use twine to upload the veritorch package to PyPI (Python Packaging Index) for package distribution. It allows users to search and download packages by keywords or by filters using pip and pipenv. We have already done so and the latest version can be viewed here: https://pypi.org/project/veritorch/0.0.3/

## Implementation

### Core data structures
#### List
We use list in the Solver class to track how many independent variables have been created so far and store copies of these independent variables. 

#### Numpy array
In Variable class, a numpy array with length equal to the number of independent variables (provided by the user when the Solver class object is first created) is used to store the derivative information. 

### Core classes and important attributes
There are two core classes in the veritorch package: the solver class and the variable class

#### Solver class

This class takes as an input the number of independent variables that the function $f$ has and tracks how many independent variables we have already created so far. 

It has the following attributes:
* n: the number of independent variables the function $f$ has
* independent_variable_list: a list of independent variables that we have created so far using this solver class

We plan to implement the following methods for this solver class:
* create_variable(x): return an independent variable with value initialized to x and derivative dx initialized to a numpy array of zeros of shape n. If this is the $i^{th}$ time we call the create_variable(x) method of this solver class, then we will set dx[i] to be 1 before we ruturn the created variable. A copy of this created independent variable will also be added to the independent_variable_list.
* get_variable(idx): return the copy of the $i^{th}$ independent variable stored in the independent_variable_list. If the user accidentally overwrites the independent variables he/she created before, this method can be called to retrieve a copy of that independent variable so that the user do not need to start the whole process again.
* merge(*args): *args should be a list of variables $[f_1, f_2, ..., f_m]$. This function returns the m by n jacobian matrix of $f=[f_1, f_2, ..., f_m]$. 

#### Variable class
This class takes as inputs the initial value $x$ and derivative $dx$ (optional, set to None by default) and includes methods to overload basic arithmic operators for the veritorch package.

It has the following attributes:
* x: the current scalar value of this variable
* dx: the current derivative of this variable, which should be a vector of length n, where n is the number of independent variables of the function $f$ whose derivative is being evaluated. Note that since the derivative vectors of variables created by the same Solver class and variables that are arithmetic products of those variables created by the same Solver class always have the same length/shape, we no longer need to figure out a way to determine whether two variable objects are independent or not.

We have implemented in this milestone the following methods for this variable class:
* \_\_str\_\_
* \_\_neg\_\_
* \_\_add\_\_
* \_\_radd\_\_
* \_\_sub\_\_
* \_\_rsub\_\_
* \_\_mul\_\_
* \_\_rmul\_\_
* \_\_truediv\_\_
* \_\_rtruediv\_\_
* \_\_pow\_\_

The methods above take as input two variable class objects and return a new variable class object with its value and derivative updated according to the arithmetic rule it corresponds to using the chain rule.

### Elementary functions
To support the usage of elementary functions in our library, we have implemented the following methods for our variable class:
* exp(x)
* log(x)
* sin(x)
* cos(x)
* tan(x)
* arcsin(x)
* arccos(x)
* arctan(x)

The methods above take as input a variable class object and return a new variable class object with its value and derivative updated according to the elementary function it corresponds to using the chain rule. 

### Aspects not implemented yet
#### Vector-valued functions
The implementation we have now only handles the scalar-output functions, $\mathbb{R}^n \rightarrow \mathbb{R}^1$. We will make it more general by finishing the implemention of the merge method outlined in the Solver class above. This will allow us to handle functions $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$. 

### External dependencies
1. NumPy: it provides us an efficient way to compute the intermediate multidimentional results and organize value and derivative vectors.

2. pytest: it provides us a systematic way to test every line of code written in the veritorch library

3. setuptools: it is used to package our repo before we distribute it.

4. TravisCI and Codecov: they are our test suites. 

## Future Features
### Optimization methods
There are plenty of unconstrained optimization methods that require the derivative information of a function. We plan to implement some of them: Newton's Method, Broyden–Fletcher–Goldfarb–Shanno (BFGS) method, conjugate gradient (CG) method, etc.

As we implement these methods, we will see whether our implementation can give us the required derivative information fast enough and whether the returned results are user friendly or not, and change our implementation accordingly. 

### Vectorized processing
If our user wants to compute a vector-valued function $f: R^n \rightarrow R^m$, our current design of veritorch requires the user to express $f_i$ in terms of the created variables for all $1\leq i\leq m$. However, there are cases where the vector-valued function $f$ consists of only elementwise operations. For example, $f(x)=2*sin(x), x\in R^n$. In such case, the user will have to input the same expression many times for each $1\leq i\leq n$. Therefore, it might be beneficial to support a special "vectorized processing" mode to reduce the workload of users if they are evaluating derivatives of some functions that involve elementwise operations only.
