## Introduction

Every science and engineering discipline relies on differentiation in some capacity, whether seeking to optimize system operations, deriving rates of change, or evaluating complex expressions. In this era of abundant computationally intensive tasks, evaluating gradients of any function (regardless of form) is both practical and valuable. The FADiff package addresses this task by automatically differentiating functions using forward mode. By implementing automatic differentiation (AD), which sequentially evaluates elementary functions, FADiff avoids the complexity of symbolic differentiation and the precision issues of numerical differentiation. Additional information on implementation is below.

## Background

Automatic Differentiation (AD) is a set of techniques for evaluating derivatives precisely based on computation graphs, chain rules and other symbolic rules. Compared with manual calculation or symbolic approach to calculating derivatives, it is highly convenient and fast since it frees users from tedius calculation and proof. Compared with finite approximation (a.k.a numerical differentiation), it is more accurate in that it avoids truncation errors or rounding-off errors that might arouse in symbolic differentiation when selecting a huge step (h) or a tiny step (h). (We've analyzed this point in HW4.). Due to these advantages, it has been widely used in scientific computing, machine learning, deep learning, etc. 

The mathematical background knowledge mainly includes matrix-vector product, Jacobian matrix, the algegra of dual numbers, Taylor's series expansion, higher-order derivatives, etc. We will discuss them in more details later. There are 2 evaluation modes in AD, **forward mode** and **reverse mode**.

1. Forward mode performs the operation of evaluating the numerical derivative concurrently with evaluating the function itself on a computational graph.

2. Reverse mode is an alternative to the forward mode. It uses the computation graph in forward mode to calculate the final output and then traveres reversely to perform to operation of evaluating derivatives. This mode is commonly used in deep learning and neural networks, in which it is also called backpropogation. 




### 1. Matrix-vector Products
##### 1.1 Definition 
   Given an $m\times n$  matrix $A_{m\times n}$ and a vector $x\in R^{n}$, there is a way to create a linear combination
   $$
   x_1a_1 + x_2a_2 + ... + x_na_n \in R^m 
   $$
   using the columns $a_1, . . . , a_n$ of $A$, where $x=\left[x_1,x_2,...,x_n \right]^{T}$.

##### 1.2 Notes
1. Matrix-vector products are only valid when the sizes of the matrix and vector are compatible – the number of elements of vector $x$ must equal the number of columns of matrix $A$. The number of elements in the output vector must equal to the number of rows in matrix $A$.
2. We can interpret the matrix-vector products as creating a linear transformation or a **map** from $R^n$ to $R^m$

### 2. Two Evaluation Mode: Forward & Reverse, Jacobian Matrix.
Automatic Differentiation (AD) can be applied on both scalar functions with one variable or functions with multiple variables. The derivative calculation of a single variable is super straight forward, while in the situations with multiple
variables, we will introduce a terminology called Jacobian Matrix ($J$).


Let's start from a general case with $x$ is a vector)

##### 2.1 Jacobian Matrix
If f is a matrix of multiple functions with multiple input variables, then denote $f$ as 
$$
f=\begin{bmatrix} f_1(x,y) \\ f_2(x,y) \end{bmatrix}
$$
Then, the derivative of matrix f is called Jacobian Matrix $J$:
$$
\begin{aligned}
  J = 
  \begin{bmatrix}
    \partial f_{1} / \partial x & \partial f_{1} / \partial y \\
    \partial f_{2} / \partial x & \partial f_{2} / \partial y
  \end{bmatrix}
\end{aligned}
$$

##### 2.2 Forward Mode
A program can be written as a combination of several functions: $f = f_1 ... f_n$, let's set $x_0$ is a vector in $R^n$, $x_n$ is the output vector, each $f_i$ is the transaction function (a generalized "matrix" from the definition of matrix-vector products), then 
$$
x_1 = f_1x_0
$$
$$
x_2 = f_2x_1
$$
$$
...
$$
$$
x_n=f_nx_{n-1}.
$$
From the chain rule, we have:
$$
\dot{x_1} =  (J f_1 x_0)
$$
$$
\dot{x_2} =  (J f_2 x_1) \times \dot{x_1}
$$
$$
 ... 
 $$
$$ 
\dot{x_n} = (J f_n x_{n-1})\times \dot{x_{n-1}}. 
$$

The above process of evaluating derivatives is called **forward mode Automatic Differentiation**.




##### 2.3 Reverse Mode
If we take transpose on both left and right sides of equation (1),(2)...(n) above, then 
$${x_1}^\prime = (f_1x_0)^T$$
$${x_2}^\prime = (f_2x_1)^T$$
$$...$$
$${x_n}^\prime = (f_nx_{n-1})^T.$$
From the chain rule, we have:
$$ {x_{n-1}}^\prime =  (J f_n x_{n-1})^T$$
$$ {x_{n-2}}^\prime =  (J f_{n-1} x_{n-2}) \times {x_{n-1}}^\prime $$
$$ ... $$
$$ {x_0}^\prime = (J f_1 x_0)\times {x_1}^\prime. $$

The above process of evaluating derivatives is called **reverse mode Automatic Differentiation**.




##### 2.4 Example of computational graph, forward and reverse mode.

![image.png](attachment:image.png)


##### 2.5 When to use reverse or forward mode?

The difference between forward and reverse mode lies in the start point of matrix multiplication. 
From the view of the times of multiplication operation, when the dimension of input is less than that of the output, forward mode has less multiplication operations than reverse mode; comparably, when the dimension of input is more than that of the output, reverse mode has less multiplication operations. 
Therefore, when the dimension of input is less than that of the output, forward mode is more efficient; when the dimension of input is more than that of the output, reverse mode is more efficient. 

### 3. The algebra of dual number 
##### 3.1 Definition
A dual number (z) is composed of a real part (a) and a dual part (b).  We denote it as $$z = a + \epsilon b$$.

##### 3.3 What's the effect of dual numbers on derivatives? 
The usage of dual number augments the arithmetic in real number space to any input and allows the user to get the derivatives without calculating them. A function f(x) where x is a dual number can be re-written in a dual number format, where the real component is the function and dual component contains the derivative (as we discussed in lecture 10).

Generally, let $\hat f$ denotes the expansion of real-value function $f$ to dual number space, then
$$ \hat f(x_1+x_1^\prime\epsilon, ..., x_n + x_n^\prime\epsilon):=f(x_1,...,x_n)+\dot f(x_1,...x_n) \left(\begin{array}{c}
    x_1^\prime\\ 
    .\\
    .\\
    .\\
    x_n^\prime\\
  \end{array}\right)\epsilon $$

If $f$ is a matrix of multiple differentiable functions, then we can extend the above framework by replacing $\dot f(x_1,...x_n)$ with $J f (x_1,...,x_n)$


### 4. Elemental functions

Automatic Differentiation relies on the fact that we've already know the derivative at each step. So, we need some elemental functions. A function is called elemental function if it always returns the same result for same argument values and it has no side effect like modifying a global variable, etc. The result of calling a elemental function is the exact return value. 
Some examples are pow(), mean(), sqrt(), while printf(), rand() and time() are not elemental functions.


## How to use FADIFF


We expect the use of our package, `FADiff`,
to be largely through its API. Where necessary or practical,
our API may permit the use of objects and functions from NumPy or other
widely-used external libraries. However, for certain areas of our
implementation, we expect our package to require the exclusive use of
internally defined objects and functions. For example, we might prohibit users
from using external libraries for elementary functions (e.g., sine and cos)
with variables and
only allow them to use our package’s implementations for such functions. This
may help to reduce the potential for issues further on in the development
process such as disuse or misuse of our package’s operator-overloaded
functions, among other things. We will be clear in our documentation on how
the user should use our package’s API including the proper use of variables
and methods.


### Run
Our package can be downloaded from
[https://github.com/teamxvii/cs107-FinalProject]
(https://github.com/teamxvii/cs107-FinalProject) (make sure you are signed in
to your GitHub account where you are a collaborator for our project). There you will see a green
button called 'Code' and a button further to the left of that which should say
'master' (if not click on it to switch it to the 'master' branch). Click on the
'Code' button and then click on 'Download ZIP' in the window that pops up. This
will download the package to your computer. Unzip its contents to
the location of your code that will be using the package. For importability,
rename the unzipped folder to something that does not contain hyphens. From
here on out, it will be referred to as `cs107_FinalProject`. In your code file,
only one header should be needed to import and use our entire package such as
`from cs107_FinalProject.code.FADiff import *`.
In addition, we implemented our package with
Python 3.8.2 on Linux but other versions of Python may still be compatible.
To retrieve the dependencies used in our package, navigate to the
`cs107_FinalProject` folder in a terminal window and run the following:

For Python 2 --
```
pip install -r requirements.txt
```
For Python 3 --
```
pip3 install -r requirements.txt
```

An example
demonstrating the use of our package is shown below (the following code would
be in your code file):

```
from cs107_FinalProject.code.FADiff import *      # Imports our package

x = FADiff(5, 1)      # Creates a variable x with value of 5 and derivative of 1
f = x + 2             # Creates a function f using variable x above
print(f.val)          # Prints value of f
print(f.der)          # Prints derivative of f
```

To run the above code, in a terminal window, navigate to the folder that contains
the `cs107_FinalProject` folder and your code file and run the following:

For Python 2 --
```
python your_code_file.py
```
For Python 3 --
```
python3 your_code_file.py
```

where 'your_code_file' is the name of your code file. The following output should
then be rendered:

```
7
1
```


### Test
Navigate to the `cs107_FinalProject/code` folder in a terminal window and run the
following:

```
pytest test_main.py
```


## Software Organization

### 1. What will the directory structure look like?
Currently, our directory structure looks like the following:
```
cs107_FinalProject/
    code/
        FADiff.py
        test_main.py
    docs/
        milestone1.ipynb
        milestone2.ipynb
        milestone2_progress.ipynb
    requirements.txt
    README.md
```
However, we anticipate the directory structure to eventually look something like:
```
cs107_FinalProject/
    src/
        FADiff/
            FADiff.py
    includes/
    docs/
        milestone1.ipynb
        milestone2.ipynb
        milestone2_progress.ipynb
    tests/
        test_main.py
    examples/
    requirements.txt
    README.md
    LISCENSE.md
    .travis.yml
    setup.py
```
### 2. What modules do you plan on including? What is their basic functionality?
Our FADiff package contains a module named `FADiff.py`. `FADiff.py` will
contain our main automatic differentiation class `FADiff()` as well as
elementary functions that are used to calculate the derivatives of
all the elementary functions our package supports such as sine and cosine. Our package
also contains a module named `test_main.py` which contains our test class used in our
testing.
We also used NumPy as an external dependency for our calculations. As explained in the
“How to Use FADiff” section earlier in this document, our implementation may need to
limit the use of external packages or only use them internally
(i.e., hidden from the API) in certain areas moving forward.

### 3. Where will your test suite live? Will you use TravisCI? CodeCov?
Currently, our tests are in the `cs107_FinalProject/code` folder in `test_main.py`.
As mentioned earlier, we eventually plan to have it live in the `/tests` directory of
the project and we used pytest for testing. Please see the
How to Use FADiff section earlier in this document for running tests.
### 4. How will you distribute your package (e.g. PyPI)?
We are aiming to distribute through PyPI if time permits. 
### 5. How will you package your software? Will you use a framework? If so, which one and why? If not, why not?
We won't be packaging the software using any sort of framework. The code will be clonable and installable via GitHub and via PyPI if time permits.
### 6. Other considerations?
After finishing Homework 4 and some upcoming lectures on various software topics such as containers, we may consider revising our software organization later on.

## Implementation
### 1. What are the core data structures?
Our code currently does not take advantage of any particular data structure, but we will certainly improve our implementation in the next round of updates to do so. We plan on using a combination of tuples, numpy arrays, linked lists and/or dictionaries to build a directed graph. The graph will track the propagation of each variable at every intermediate step in a calculation. To build the graph, we plan on using a dictionary where each key contains the name of a node (intermediate step) and each value is a tuple with three elements: the value, the partial derivative with respect to the parent variable, and the key to the parent variable’s node. The value and partial derivative may be contained in a NumPy array if needed.

### 2. What classes will you implement?
We have implemented an automatic differentiation class which is instantiated with two attributes, a value and a derivative. We may need to add a dictionary element to maintain the graph described above. Currently, the user is required to specify the seed vector, but we are discussing variations on this design choice.

### 3. What method and name attributes will your classes have?
Class methods include dunder methods for addition, subtraction, multiplication, division, power, negation and the corresponding right-hand versions of these, as well as specificed functions for sine, cosine, tangent, and exponentiation. In the next round of updates, we will add methods for sqrt, log, __str__ and __repr__.

### 4. What external dependencies will you rely on?
We currently rely on NumPy for trig functions and exponentiation, and we will eventually use its array and linear algebra functions as well. We are considering also using Sphinx for auto-rendering and organizing our documentation.

### 5. How will you deal with elementary functions like sin, sqrt, log, and exp (and all the others)?
As stated above, our package will rely on NumPy within class methods for these operations.


## Future Features

### Things to impelement next

In this Milestone, we treat Jacobian as a scalar and it can only handle the case of single function of single input. Moving forward, we want to generalize the package for broader use cases. 

1. We will make the forward mode automatic differentiation object be able to access Jacobian Matrix.
2. We will make the object to be for calculating partial direvatives. 
    1. The challenging part is how to handle the number of variables we have, e.g if we define a class to calculate derivatives for multivariable functions f(x,y,z) and f(x,y,z,m,n), 
which data structure should we use as the attribute of the class? 
    2. If we use an array instead of a scalar as the attribute, how can we implement the differentiation in an array?
3. We might also think about how to calculate differentiation for polynomial functions like f(x) = x + sin(x) + cos(x).
    1. The challenging part here is how to implement a dunder method to handle the order of add or substraction
    2. Presumably, we might need to change the classes or change the data structures or add new modules. We should consult TAs and Prof. Sondak for more instruction and insights on implementation.




## Feedback

### Milestone 1

#### Comment

Good job! Consider using a different data structure apart from list for your main data structure. You may want to use something a bit more robust and structured i.e. dictionary, tree, hashmap. Hopefully the upcoming homework will give you all a bit of insight into the functionalities/pros and cons of a data structure like a BST

Oluwatosin Alliyu , Oct 27 at 9:27pm

#### Group Response

Per Oluwatosin’s feedback, we are considering a dictionary or a tree-like structure as our main data structure for our evaluation trace. Due to the binary nature of primitive operations (i.e., two inputs produce an output), we hope a tree could possibly be used in that respect. Each node in a trace will have an elementary function value and its derivative. If we use a dictionary it will allow us to add a key-value pair where the key is the name of the node and the value is a tuple in which the first element will be the elementary function’s value and the second, its derivative. We would greatly appreciate advice or direction as to whether a tree or a dictionary would best satisfy our use case.
