# Report

This collection of notebooks constitutes a short report of the activities that I conducted during the 2-months position. 

The [overview](overview.ipynb) presents some of the inner details of t-SNE: how it addresses the crowding problem and how the cost gradient looks like for a pair of points as a function of perplexity and early exaggeration. Force-based algorithms are then presented, as well as the gradient for different types of cost. A short discussion about the similarities and differences follows before looking at the different optimisations that have been proposed and implemented for t-SNE. Finally, reflecting on some findings based on the literature, a short list of research opportunities is proposed.

The [experiment #0 notebook](experiment_0.ipynb) runs t-sne on common datasets (MNIST in particular) and implements the knn metrics, which provides one quantitative measure of the quality of an embedding. These results are then compared with 2 available Python-based force layout algorithms.

The [experiment #1 notebook](experiment_1.ipynb) proposes another implementation of force based layout, and confirms the results from the [experiment #0 notebook](experiment_0.ipynb) in that force based layout seems to have trouble dealing with massive and highly connected datasets such as MNIST 70K.

The [misceallenous notebook](misceallenous.ipynb) contains some independent or exploratory results that are not connected to the big picture.

The [literature notebook](literature.ipynb) compiles the material I have been reading and saving for this project. It gathers a short description of the work cited with references.

The [conclusion](conclusion.ipynb) contains a short wrapup on the project with regards to what has been found in relation to the initial goals.


## what is needed to run the notebooks?

You will need a simple python installation. I would recommend [Miniconda](https://docs.conda.io/en/latest/miniconda.html). All the code that I ran is publicly available online and can be installed through the command line.

## testing all the packages are available

You can install the packages that will be needed in the following notebooks with:

```conda install --file requirements/conda.txt```

followed by:

```pip install -r requirements/pip.txt```

Then, we can try to import the packages in the current notebook. 

In [2]:
import numpy
import matplotlib
import openTSNE
import sklearn
import pynndescent
import numba
import pyamg
import tqdm
import networkx
import fa2
import forcelayout

If the cell above execute without errors, then all packages are installed.

# Datasets
## MNIST-Fasion

[Mnist fashion](https://github.com/zalandoresearch/fashion-mnist.git) is included as a submodule in this repository. You should check that you have downloaded the file under datasets/fashion-mnist. If it is not the case, please execute the following:

```
git submodule init 
git submodule update
```