# Introduction 
_Describe the problem the software solves and why it's important to solve that problem_

Our software package, *AD-FBI*, computes gradients using the technique of automatic differentiation. Automatic differentiation is important because it is able to solve the derivatives of complex functions at low cost while maintaining accuracy and stability. Its high practicality and precision make it applicable to a wide range of tasks, such as machine learning optimization, calculating financial loss or profit, and minimizing the cost of vaccine distribution. Automatic differentiation plays a crucial role in solving these problems. 

Prior to the technique of automatic differentiation, the two traditional computational techniques for solving differential equations are numerical differentiation and symbolic differentiation. The numerical differentiation method computes derivatives based on a finite numerical approximation which lacks stability and precision, and thus it is not ideal to be implemented in computer software. On the other hand, symbolic differentiation can become very computationally expensive as the function becomes more and more complex. 

The automatic differentiation technique transforms the entire computational process into a sequence of elementary arithmetic operations -- addition, subtraction, multiplication, division, and elementary functions. And then, it applies the chain rule on top of these elementary operations. In this way, it reduces the computational complexity that the model can reuse some of its parts computed in previous steps multiple times without re-calculating them. It keeps the same level of accuracy and precision as using symbolic differentiation. 

Our software package computes derivatives using the forward mode of auto differentiation and will later explore other extensions (different optimizer). 


# Background
_Describe (briefly) the mathematical background and concepts as you see fit._


* __Chain Rule__

Chain Rule is the most important concept in automatic differentiation. To solve the derivatives of a composite function, we use the Chain Rule to decompose each variable in the function into elementary components and mulptily them together. 

For a given function $f(y(x))$, the derivative of $f$ with respect to $ x $ is the following:

$$
\begin{align}
\frac{\partial f}{\partial x} = \frac{\partial f}{\partial y} \frac{\partial y}{\partial x}\\
\end{align}
$$

Since $y$ is a n-dimensional vector, we introduce the gradient operator $ \nabla $ into the expression to calculate the derivative of $y$ with respect to $x$, where $x = (x_1, ..., x_n)$:

$$
\begin{align}
\nabla y(x) =
\begin{bmatrix}
{\frac {\partial y}{\partial x_{1}}}(x)
\\
\vdots 
\\
{\frac {\partial y}{\partial x_{n}}}(x)
\end{bmatrix}
\end{align}
$$

The above expression is for a single $y$, but we typically have multiple $y$ in a neural network. Thus, for a given function $f(y_1(x), ..., y_n(x))$, the derivative of $f$ with respect to $x$ is defined as the following:

$$
\begin{align}
\nabla f_x = \sum_{i=1}^n \frac{\partial f}{\partial y_i} \nabla y_i(x)\\
\end{align}
$$


* __Forward Mode automatic differentiation__

Here we will give an example of how to do forward mode automatic differentiation.

Given x = $\begin{bmatrix} {x_1} \\ \vdots \\ {x_m} \end{bmatrix}$, where $k \in (1, 2, ..., m)$, we introduce the intermediate operations $ v $ to compute values at each elementary operation step.

For example, to compute the gradient $\nabla f$  of the function $f(x) = log(x_1) + sin(x_1 + x_2)$, the expression is derived as the following:

$\nabla f = \begin{bmatrix} \frac {\partial f} {\partial x_1} \\ \frac {\partial f} {\partial x_2} \end{bmatrix}  = \begin{bmatrix} \frac {1} {x_1} + \cos(x_1 + x_2) \\ \cos(x_1 + x_2) \end{bmatrix}$

The computation graph is shown here: 
![alt text](graph.png "Computational Graph")

$D_p v_{-1} = \nabla v_{-1}^T p = (\frac {\partial v_{-1}} {\partial x_1} \nabla x_{1})^T p = (\nabla x_{1})^T p = p_1$

$D_p v_{0} = \nabla v_{0}^T p = (\frac {\partial v_{0}} {\partial x_2} \nabla x_{2})^T p = (\nabla x_{2})^T p = p_2$

$D_p v_{1} = \nabla v_{1}^T p = (\frac {\partial v_{1}} {\partial v_{-1}} \nabla v_{-1} + \frac {\partial v_{1}}{\partial v_{0}} \nabla v_{0})^T p = (\nabla v_{-1} + \nabla v_0)^T p = D_p v_{-1} + D_p v_0$

$D_p v_{2} = \nabla v_{2}^T p = (\frac {\partial v_{2}} {\partial v_{1}} \nabla v_1)^T p = \cos(v_1) (\nabla v_1)^T p = \cos(v_1) D_p v_1$

$D_p v_{3} = \nabla v_{3}^T p = (\frac {\partial v_{3}} {\partial v_{-1}} \nabla v_{-1})^T p = \frac {1} {v_{-1}} (\nabla v_{-1})^T p = \frac {1} {v_{-1}} D_p v_{-1}$

$D_p v_{4} = \nabla v_{4}^T p = (\frac {\partial v_{4}} {\partial v_3} \nabla v_{3} + \frac {\partial v_{4}}{\partial v_{2}} \nabla v_{2})^T p = (\nabla v_{3} + \nabla v_2)^T p = D_p v_{3} + D_p v_2$

Thus, the final generalized formula is the following:

$$ D_p v_j = (\nabla v_j)^T p = (\sum_{i < j} \frac{\partial{v_j}} {\partial{v_i}} \nabla v_i)^T p = \sum_{i < j} \frac{\partial{v_j}} {\partial{v_i}} (\nabla v_i)^T p = \sum_{i < j} \frac{\partial{v_j}} {\partial{v_i}} D_p v_i$$ 


* __Jacobian Matrix__

Having derived the above system of equations, we want to use the Jacobian matrix to compute these derivatives systematically.

The Jacobian matrix is defined as the following:

$$
J_{p}(x_{1},x_{2}, ..., x_{n}) = \left[ \begin{matrix}
\frac{\partial y_{1}}{\partial x_{1}} & ... & \frac{\partial y_{1}}{\partial x_{n}} \\
\vdots  & \ddots & \vdots  \\
\frac{\partial y_{m}}{\partial x_{1}} & ... & \frac{\partial y_{m}}{\partial x_{n}}
\end{matrix} \right] 
$$


For example, a 2 by 2 Jacobian matrix with 2 variables looks like the following:

$$
J_{p}(x_{1},x_{2}) = \left[ \begin{matrix}
\frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{1}}{\partial x_{2}} \\
\frac{\partial y_{2}}{\partial x_{1}} & \frac{\partial y_{2}}{\partial x_{2}}
\end{matrix} \right] 
$$

We compute $J_{p}$ in the forward mode in the evaluation trace.

* __Seed Vector__

Seed vector $p$ provides an efficient way to retrieve elements in a given direction from the Jacobian matrix. For example, if we want to retreive the $i$, $j$ element from the Jacobian matrix, the seed vector $p = [\overrightarrow{i}, \overrightarrow{j}]$ helps to retrieve the element of $\frac{df_{i}}{dx_{j}}$. 

We will introduce the seed vector in the forward trace to facilitate the retrieval of any element in the Jacobian matrix to make the calculation process more efficient and faster. The default value of the seed vector is 1


# How to use AD-fbi
_How do you envision that a user will interact with your package? What should they import? How can they instantiate AD objects?_

1. __Installing the package:__
   * a. Users will install our package with the following line: 
    
    <code>python3 -m pip install AD-fbi </code>
      
   * b. Users will install dependencies from requirement.txt with the following command: 

    <code>python3 -m pip install requirement.txt </code>


2. __Importing the package:__

   * Users import our package using the following line:
    
    ```python
    from AD-fbi import dual_fbi, forward_mode, optimization
    ```
    

3. __Instantiating AD objects:__
   * _In order to instantiate an AD object, we need_:
      - a input variable (either a scalar or a vector) with the specific value at which the function and the derivative will be evaluated 
      - the function of the input variable
   
   The following are two examples to instantiate AD objects:

   a. Instantiate an AD object for a scalar function for a single input ($R$ -> $R$):

   ```python

   from AD-fbi import dual_fbi, forward_mode, optimization
   from AD-fbi.forward_mode import ForwardMode

   # input variable - a scalar
   x = np.array([0.5]) 

   f_x = lambda x: 2*x + 100
   # instantiate an AD object
   fm = ForwardMode(evaluate = x, function = f_x, seed=1)
   ```

   b. Instantiate an AD object for a vector function with multiple input variables ($R^m$ -> $R^n$):

   ```python
   # input variable -  a vector of R^3
   x = np.array([0.5, 0.5, 0.2]) 

   # output function -  R^3 -> R^2
   f_x = lambda x, y, z: (x.log() + y + z, x * y + z)

   # instantiate an AD object
   fm = ad.ForwardMode(evaluate = x, function = f_x, seed=np.array([1,2,3]))
   ```

   c. Instantiate an AD object for a vector function with multiple input variables ($R^m$ -> $R$):

   ```python
   # input variable -  a vector of R^3
   x = np.array([0.5, 0.5, 0.2]) 

   # output function -  R^3 -> R
   f_x = lambda x, y, z: x * y + z

   # instantiate an AD object
   fm = ad.ForwardMode(evaluate = x, function = f_x, seed=np.array([1,2,3]))
   ```

3. __Interacting with our additional features:__
   We aim to develop optimizers such as the Adam optimizer for stochastic gradient descent to find the minima of a function. 
   Here is an example of how to interact with the feature.
   
   ```python
   from AD-fbi import optimization

   x = 1
   
   # define the function to find the minima points
   f_x = lambda x: x**3 + 3 * x

   # instantiate different optimizers
   opm_adam = Optimization.adam()
   opm_mom = Optimization.momentum()
   opm_ada = Optimization.ada()

   # find the mimina points and the running times for each optimizer
   # the optimizer methods will return the running time, the minimum values of the function (minima), and the locations to get the minima
   time_adam, min_adam, x_vals_adam = opm_adam(x, f_x, num_iter=100, epsilon=1e-10)
   time_mom, min_mom, x_vals_mom = opm_mom(x, f_x, num_iter=100, epsilon=1e-10)
   time_ada, min_ada, x_vals_ada = opm_ada(x, f_x, num_iter=100, epsilon=1e-10)

   ```

# Software Organization
_Discuss how you plan on organizing your software package._


* __What will the directory structure look like?__
   * Our package is called _AD-fbi_, the following is the planned directory structure.
  
    
```bash
   
   team06/
      AD-fbi/
         __init__.py
         dual_fbi.py
         forward_mode.py
         optimization.py
      docs/
        milestone1.md
        documentation.md
      tests/
        test_dual.py
        test_forwardmode.py
        test_optimiz.py
      requirement.txt
      setup.py
      setup.cfg
      README.md
      LICENSE

``` 

* __What modules do you plan on including? What is their basic functionality?__

   There will be 2 essential modules for forward mode automatic differentiation and one extension module that will be implemented later in the project development. The followings are the decriptions of their basic functionalities:

   * dual\_fbi.py: This module will contain the DualNumber class which contains methods to compute the function and derivative values of the elementary operations, and to overload the elementary operations for a dual number. These functions are essential to compute the primal and tangent trace in the forward mode of AD. Examples of elementary operations include: '+', '-', '*' '/', 'sqrt(x)'.
    
   * forward\_mode.py: This module will contain a ForwardModel class that provides a method to intialize a forward mode AD object, a method to construct a computational graph dictionary, and a method to run the forward mode process and returns the computed derivative of the function at the evaluation point.

   * optimization.py (extension module): This module will contain extensions to the basic automatic differentiation functionalities. We aim to develop this module to develop optimizers such as the Adam optimizer for stochastic gradient descent. 


*  __Where will your test suite live?__
   * The test suite will live in the "tests" directory, a subdirectory of our package root directory. It will contain all unit tests, integration tests, system and acceptance tests for our package. GitHub Actions, a CI provider, will be deployed to see which of our tests are passing/failing. All of the CI setup, jobs, and dependencies will be defined in YAML files.
    
    
* __How will you distribute your package?__
   * The package will be distributed via PyPI. We will first add a _pyproject.toml_ file to our project, then install `build` (a PEP517 package builder). After that, build our package release and finally upload to PyPI. We also plan to add setup.py and setup.cfg files to set up the backend of our package development.

   
* __Other Considerations?__
   * Package dependencies: We will rely on the following external functionalities:
      - numpy
      - pytest



# Implementation

_Discuss how you plan on implementing the forward mode of automatic differentiation._
    
* What classes do you need and what will you implement first?
  * a. DualNumbers: class for operations with a dual number.
  * b. ForwardMode: class for forward mode differentiation.
  * c. Optimizer (extension module): The class that creates an optimizer object. This object has no attributes, but each method within the
   class requires an x input, a function input, and the number of iterations for the specific optimizing method. Additionally, each method
   has their own optional hyperparameters which the user can input if they choose not to use our standard default values.

* What are the core data structures? How will you incorporate dual numbers?
  * Our primary core data structure is the numpy array, which we use to store both the variable list and the function list. Then using the
   methods within the forward_method class we compute the jacobian and function value storing those values or arrays, depending on
   the input, in a tuple

* What method and name attributes will your classes have?
  * a. DualNumbers: 
    * Method to initialize a DualNumber object with the real value and dual number value.
    * Method to return the object representation in string format.
    * Methods to overload the elementary operations for a dual number. e.g. `__add__`, `__sub__`, `__mul__`, `__div__`, `__pow__`, `__radd__`, `__rsub__`, `__rmul__`, `__rdiv__`, `__rpow__`, etc.
    * Methods to compare dual numbers. e.g. `__ne__`, `__eq__`, etc.
    * Methods to transform a dual number. e.g. `sqrt`, `log`, `sin`, `cos`, `exp`, `tan`, etc.

  * b. ForwardMode:
    * Method to initialize a ForwardMode Object with a point, a function, and a seed vector.
    * Method to run the forward mode process and return a function value at the evaluation point
    * Method to run the forward mode process and return the jacobian matrix at the evaluation point
    * Method to run the forward mode process and return a value of directional derivative corresponding to a specific seed vector.

* Will you need some graph class to resemble the computational graph in forward mode or maybe later for reverse mode?  Note that in milestone 2 you propose an extension for your project, an example could be reverse mode.
  * We are not considering using graph class to resemble the computational graph. 
  
* Dealing With Operator Overloading and Elementary Functions

   * For the overloading operator template (like '__add__' for our special dual number class object), we hope it can handle both the case when it's added with another dual number class object or a real number, which are both necessary for computing the gradient in forward mode.

   *  As listed above, within the dual_number class we’ve overloaded the simple arithmetic functions (addition, subtraction, multiplication,
   division, negation, power, equal, and not equal) to calculate both the value and the dual number. We’ve also defined our own
   elementary functions, such as sin(x) and sqrt(x) (see above for full list) to compute the value and the derivative. This module
   generalizes each of the functions in order for the forward_mode class to handle both scalar and vector inputs. Each method also
   indicates errors specific to the types of possible invalid inputs. The output is a tuple of both the function value and the derivative, which
   is used in the forward_mode class
   
* Dealing With Operator Overloading on Reverse Mode

   * We currently are not interested in implementing a reverse mode on our package.

* Dealing With MultiDimensional Input and Output Function

   * Use Try .. Exceptions to handle multi dimensional and single dimensional case separetly. For multi dimensional case, we will design a helper function to loop through all of the functions inputs and reassign the value/derivative as a vector. 
   * We plan to treat functions as a list (so high dimensional functions will be a list of functions)
   * The grad() function (or jacobian function) will be generic to both single dimensional and multi dimensional(as they are either list of 1 or list of mulitple functions); we will figure out how to handle both situations after we have the output.
   
* External Dependencies:
   * We’ve used the numpy library to create our data structure for the computational graph and perform
   computations outside of those we created in our dual_number class.


# License

*Briefly motivate your license choice*

Our *AD-fbi* package is licensed under the GNU General Public License v3.0. This free software license allows users to do just about anything they want with our project, except distribute closed source versions. This means that any improved versions of our package that individuals seek to release must also be free software. We find it essential to allow users to help each other share their bug fixes and improvements with other users. Our hope is that users of this package continually find ways to improve it and share these improvements within the broader scientific community that uses automatic differentation.

# Feedback 

## Milestone 1

Introduction (2/2)

Great Job!

Response: Thanks!

Background (2/2)

Great job! Please include a discussion about the seed vector. This will be part of your final project as well. I expect to see this addition in the next milestone.

Response: Thank you so much for the comments! We have added the seed vector in the background.

How to use (3/3)

One suggestion I have is to use a conventional nomenclature in your example usage. Use x as your variable name instead of inp. (no point deduced for this)
Another thing to consider is how people would use your package to calculate R^m -> R^n, etc.
Additionally, I encourage you to also think about how people would interact with your additional features, and providing an example of that would be great.
I expect to see more about these implementations in your revision.

Response: 
Updated the nomenclature for input variable suggested to `x` in all of the examples in sections 3 (a/b/c) and 4. 
We also considered the circumstances where the input is a R^m vector, output dimension is either R^n (specified in section 3.b) or R (section 3.c). 
One example for the additional features of the package that we aim to implement is shown in section 4, which implements the optimization class. 

Software Organization (2/2)

Great job!

Response: Thanks

Implementation (explicit design considerations) (3.5/4)

Great job on the detailed implementation plan!
You mentioned that you want to be able to draw the computational graph in the forward model. However, as you could see in pair-programming and homework exercises, the forward mode implementation does not save the intermediate results, so if you want to construct the computational graph in a forward mode, please discuss more on how you plan to implement it in addition, to simply mention you will use a dictionary (-0.5). Alternatively, in lecture 13, we talked about the reverse mode (back-propagation) in AD, where the output from the previous step will be fed into the next step. This could be the way where you save your intermediate outputs.
I understand that you proposed to do the optimization as your additional feature, so I would suggest you discuss it with the group and pick one as your additional feature for this project. Please reflect on these in the next milestone.

Response: Thanks for your feedback on the difference between forward mode and reverse mode implementation with computatioanl graph. We updated our implementation plan for the data sturcture and will not consider drawing the computational graph for now as it will not be used in the forward mode implementation. Now instead of a dict, we will use numpy array to store both the variable and the function(jacobians) in the forward mode computation and stored both in a tuple; also, we will stick with optimization as our additional feature and will not consider implementing the reverse mode for this project. Please see more details about our updates on the Implementation Section above.

License (conflict with dependencies?) (1/2)

Did not specifically mention what software license you want to use for the package. Moreover, you should have a dedicated section to talk about licensing (-1). Please add the licensing section in the next milestone.

Response: License Section added to the milestone1 document

Total (13.5/15)