# Exercise 1: t-SNE

## Do not start the exercise until you fully understand the submission guidelines.


* The homework assignments are executed automatically.
* Failure to comply with the following instructions will result in a significant penalty.
* Appeals regarding your failure to read these instructions will be denied.

## Read the following instructions carefully:

1. This Jupyter notebook contains all the step-by-step instructions needed for this exercise.
1. Write **efficient**, **vectorized** code whenever possible. Some calculations in this exercise may take several minutes when implemented efficiently, and might take much longer otherwise. Unnecessary loops will result in point deductions.
1. You are responsible for the correctness of your code and should add as many tests as you see fit to this jupyter notebook. Tests will not be graded nor checked.
1. You are allowed to use functions and methods from the [Python Standard Library](https://docs.python.org/3/library/).
1. Your code must run without errors. Use at least `numpy` 1.15.4. Any code that cannot run will not be graded.
1. Write your own code. Cheating will not be tolerated.
1. Submission includes a zip file that contains this notebook, with your ID as the file name. For example, `hw1_123456789_987654321.zip` if you submitted in pairs and `hw1_123456789.zip` if you submitted the exercise alone. The name of the notebook should follow the same structure.
   
Please use only a **zip** file in your submission.

---
##❗❗❗❗❗❗❗❗❗**This is mandatory**❗❗❗❗❗❗❗❗❗
## Please write your RUNI emails in this cell:

### ***yonatan.greenshpan@post.runi.ac.il***
---

## Please sign that you have read and understood the instructions:

### ***204266191***  

---


In [None]:
# Import necessary libraries
import numpy as np
from sklearn.manifold import TSNE
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

np.random.seed(42)

# Design your algorithm
Make sure to describe the algorithm, its limitations, and describe use-cases.

## **Algorithm Description**

t-SNE is a nonlinear dimensionality-reduction algorithm designed to embed high-dimensional data into a low-dimensional space while preserving **local neighborhood structure**.

The core idea is to convert pairwise distances in high-dimensional space into probability distributions, and then find a low-dimensional embedding whose probability distribution is as similar as possible by minimizing KL-divergence. It is importent to note, that in t-SNE we tune the data’s low-dimensional coordinates themselves rather than adjusting model parameters.

The algorithm consists of the following stages:

### **1. Compute Pairwise Affinities in High Dimension**

For each point $x_i$, define conditional probabilities:

$$
p_{j|i} \propto \exp\left(-\frac{\|x_i - x_j\|^2}{2\sigma_i^2}\right)
$$

- $x_i$ — the $i$-th data point in the original high-dimensional space.  
- $x_j$ — a neighboring data point whose similarity to $x_i$ we measure.  
- $\|x_i - x_j\|^2$ — squared Euclidean distance between points $x_i$ and $x_j$.  
- $\sigma_i$ — the bandwidth (standard deviation) of the Gaussian centered at $x_i$, selected individually per point.  
- $p_{j|i}$ — the conditional probability that $x_i$ would pick $x_j$ as a neighbor.  

### **2. Symmetrize the Probabilities**

Because each point $x_i$ uses its own $\sigma_i$, the conditional probabilities $p_{j|i}$ and $p_{i|j}$ reflect two different local neighborhoods (“two different families of relatives”). So even if the distance $\|x_i - x_j\|$ is the same in both directions, the probabilities are not. t-SNE cannot work with two different notions of similarity for the same pair. To define **one shared, mutual similarity** between $x_i$ and $x_j$, we combine the two directional probabilities:  

$$
p_{ij} = \frac{p_{j|i} + p_{i|j}}{2n}
$$

### **3. Initialize Low-Dimensional Embeddings**

Randomly initialize $y_i \in \mathbb{R}^2$ (or $\mathbb{R}^3$) according to the visualization type we want.


### **4. Define Low-Dimensional Similarities Using a t-Distribution**

In the low-dimensional space, we also want a *imilarity distribution between points: pairs that are close in 2D should get high similarity, and far pairs should get low similarity.

If we used a **Gaussian** here (like in the high-dimensional space), many points would be pulled too close together in the center, which leads to the **crowding problem**: too many moderately distant points all collapse near the origin.

To avoid this, t-SNE uses a **heavy-tailed** distribution so that moderately far points still exert noticeable “repulsive” force.  We choose a **Student-t distribution with 1 degree of freedom** (also known as the Cauchy distribution), whose probability density function is:

$$
f(t) = \frac{\Gamma\!\left(\frac{\nu + 1}{2}\right)}
{\sqrt{\nu\pi}\,\Gamma\!\left(\frac{\nu}{2}\right)}
\left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu + 1}{2}}
$$

For $\nu = 1$ this simplifies to (up to a constant factor):

$$
f(t) \propto \frac{1}{1 + t^2}
$$

In t-SNE we plug in the low-dimensional distance
$t = \|y_i - y_j\|$ and then normalize over all pairs to get a proper probability distribution:

$$
q_{ij} = \frac{(1 + \|y_i - y_j\|^2)^{-1}}
{\sum_{k \neq l} (1 + \|y_k - y_l\|^2)^{-1}}
$$

This $q_{ij}$ is the **low-dimensional similarity** between $y_i$ and $y_j$.  


### **5. Minimize KL-Divergence Between $P$ and $Q$**

We optimize this expression iteratively with the reugular techniques:

$$
\mathrm{KL}(P\|Q)
=
\sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}
$$

## **Limitations**

- Does **not scale well** to very large datasets, naively complexity is $O(n^2)$)
- **Global structure is unreliable** — only local neighborhoods are meaningful.
- t-SNE optimizes the embedded coordinates themselves, no so we **don't get at the end of the learning a function that can map new data** into the space like in PCA
- **Sensitive to hyperparameters** such as perplexity and learning rate.  
- **No inverse transform** — cannot reconstruct high-dimensional vectors from the 2D embedding.

---

## **Use-Cases**

- **Visualization of high-dimensional datasets** such as images, text embeddings, biological data, user behavior features, or any complex structured data.
- **Exploratory Data Analysis (EDA)** to reveal clusters, subgroups, anomalies, or hidden structure that may not be visible in the raw high-dimensional space.
- **Understanding model representations**, for example examining the latent space of neural networks, autoencoders, or transformer embeddings.
---


# Your implementations
You may add new cells, write helper functions or test code as you see fit.
Please use the cell below and include a description of your implementation.
Explain code design consideration, algorithmic choices and any other details you think is relevant to understanding your implementation.
Failing to explain your code will lead to point deductions.

In [None]:
class CustomTSNE:
    def __init__(self, perplexity=30.0, n_components=2, n_iter=1000, learning_rate=200.0):
        self.perplexity = perplexity
        self.n_components = n_components
        self.n_iter = n_iter
        self.learning_rate = learning_rate
        # Note: You may add more attributes

    # Part 1: Implementing t-SNE
    def fit_transform(self, X):
        # Return Y, the transformed data
        pass

    # Part 2: Transformation of New Data Points
    def transform(self, X_original, Y_original, X_new):
        # Implement your method for incorporating new points into the existing t-SNE layout
        # Your code here

        # Return Y_new, the transformed data
        pass

# Load data
Please use the cell below to discuss your dataset choice and why it is appropriate (or not) for this algorithm.

In [None]:
# Load data

# Normalize data if necessary

# Split the data into train and test

# t-SNE demonstration
Demonstrate your t-SNE implementation.

Add plots and figures. The code below is just to help you get started, and should not be your final submission.

Please use the cell below to describe your results and tests.

Describe the difference between your implementation and the sklearn implementation. Hint: you can look at the documentation.

In [None]:
# Run your custom t-SNE implementation
custom_tsne = CustomTSNE(n_components=2, perplexity=N/10)
custom_Y = custom_tsne.fit_transform(X_train)

# Run sklearn t-SNE
sk_tsne = TSNE(n_components=2, init='random', perplexity=N/10)
sk_Y = sk_tsne.fit_transform(X_train)

# Visualization of the result
plt.figure()
plt.scatter(custom_Y[:, 0], custom_Y[:, 1], s=5, c=label_train.astype(int), cmap='tab10')
plt.scatter(custom_Y[:, 0], custom_Y[:, 1], s=5, c=label_train.astype(int), cmap='tab10')
plt.colorbar()
plt.title('MNIST Data Embedded into 2D with Custom t-SNE')

plt.figure()
plt.scatter(sk_Y[:, 0], sk_Y[:, 1], s=5, c=label_train.astype(int), cmap='tab10')
plt.colorbar()
plt.title('MNIST Data Embedded into 2D with sklearn t-SNE')
plt.show()

# t-SNE extension - mapping new samples
Demonstrate your t-SNE transformation procedure.

Add plots and figures.

Please use the cell below t describe your suggested approach in detail. Use formal notations where appropriate.
Describe and discuss your results.

In [None]:
# Transform new data
custom_Y_new = custom_tsne.transform(X_train,custom_Y,X_test)

# Visualization of the result
plt.figure()
plt.scatter(custom_Y[:, 0], custom_Y[:, 1], s=5, c=label_train.astype(int), cmap='tab10')
plt.scatter(custom_Y_new[:, 0], custom_Y_new[:, 1], marker = '*', s=50, linewidths=0.5, edgecolors='k', c=label_test.astype(int), cmap='tab10')
plt.colorbar()
plt.title('MNIST Data Embedded into 2D with Custom t-SNE')

# Use of generative AI
Please use the cell below to describe your use of generative AI in this assignment.

### Gen-AI Usage Summary

For this assignment I used three Gen-AI tools :

- **ChatGPT Pro** – for general conceptual questions, clarifications, phrasing, LaTex convertion, simplle synax questions.  
- **Gemini Pro (Learning Mode)** – used only at the beginning to verify that I understood the presentation and algorithm step by step.  
- **Cursor** – used extensively for creating templates, docuemantation, refactors and writing tests.

With **Cursor**, I explicitly structured the work so that each section was developed and tested separately. I asked targeted, limited questions (e.g., *“Explain step X”*, *“Help me validate Y”*, *“Rewrite this specific function”*, *"give me template for function that does Z"*), and never asked it to produce a full end-to-end solution. All implementation decisions, testing, debugging, and integration were done by me.
