# Johnson-Lindenstrauss Lemma

**"Almost every high dimensional dataset is compressible"**

The Johnson-Lindenstrauss Lemma is a fundamental result in the field of dimensionality reduction and high-dimensional geometry. It states that a small set of points in a high-dimensional space can be embedded into a lower-dimensional space in such a way that the distances between the points are nearly preserved. The lemma is particularly noteworthy for its implications in the efficiency of algorithms dealing with high-dimensional data.

## Statement of the Lemma

Given $0 < \varepsilon < 1$ and an integer $n$, let $k$ be a positive integer such that:

$$
k \geq \frac{4 \cdot (\ln(n))}{\varepsilon^2 / 2 - \varepsilon^3 / 3}
$$

Then for any set $V$ of $n$ points in Euclidean space $\mathbb{R}^d$, there exists a Lipschitz function $f : \mathbb{R}^d \to \mathbb{R}^k$ such that for all $u, v$ in $V$,

$$
(1 - \varepsilon) \cdot \|u - v\|^2 \leq \|f(u) - f(v)\|^2 \leq (1 + \varepsilon) \cdot \|u - v\|^2
$$

This means that the distances between the points in $V$ are preserved up to a factor of $(1 \pm \varepsilon)$ when mapped from the high-dimensional space $\mathbb{R}^d$ to the lower-dimensional space $\mathbb{R}^k$.

## Implications

The Johnson-Lindenstrauss Lemma has significant implications for dimensionality reduction techniques, suggesting that high-dimensional data can often be represented in much lower dimensions without significantly distorting distances between data points. This result is particularly useful in machine learning and data analysis, where dealing with high-dimensional datasets is common. It enables efficient algorithms for nearest neighbor search, clustering, and data visualization.

## Applications

1. **Data Compression:** The lemma provides a theoretical foundation for compressing high-dimensional data into a lower-dimensional space, facilitating storage and processing efficiency.
2. **Machine Learning:** In machine learning, dimensionality reduction is crucial for reducing the complexity of models, improving training times, and mitigating the curse of dimensionality.
3. **Nearest Neighbor Search:** The preservation of distances allows for efficient approximation algorithms for nearest neighbor searches in high-dimensional spaces.

## Conclusion

The Johnson-Lindenstrauss Lemma demonstrates that high-dimensional data can be effectively and efficiently managed by projecting it onto a lower-dimensional space with minimal loss of information. This has profound implications for numerous applications in computer science, statistics, and data analysis.


# Constructive Proof of the Johnson-Lindenstrauss Lemma

The Johnson-Lindenstrauss Lemma asserts that a small set of points in a high-dimensional space can be projected into a lower-dimensional space in such a way that the distances between the points are nearly preserved. This proof outlines a method using random linear projections to demonstrate the lemma.

## Step 1: The Random Projection Matrix

- **Construct a Projection Matrix:** Consider a projection matrix $\Phi$ of dimensions $k \times d$ for projecting from $\mathbb{R}^d$ to $\mathbb{R}^k$. Each entry $\Phi_{ij}$ of this matrix is independently drawn from a normal distribution $\mathcal{N}(0, 1/k)$. This ensures that the expected length of vectors is preserved when projected.

## Step 2: Properties of the Projection

- **Preservation of Distance:** Projecting a vector $v \in \mathbb{R}^d$ using $\Phi$ to a lower-dimensional space results in $\Phi v \in \mathbb{R}^k$. The projection is designed to approximately preserve the Euclidean distance between any two points with high probability. For any vectors $u, v \in \mathbb{R}^d$, and for $0 < \varepsilon < 1$, it holds that:

  $$
  (1 - \varepsilon)\|u - v\|^2 \leq \|\Phi u - \Phi v\|^2 \leq (1 + \varepsilon)\|u - v\|^2
  $$

## Step 3: Concentration Inequalities

- **Use of Concentration Inequalities:** To prove the distance preservation, concentration inequalities (like Levy's or Gaussian concentration inequalities) are used. These inequalities demonstrate that the distances after projection deviate from their expected values with a probability that decreases exponentially with $k$, ensuring the projection's effectiveness.

## Step 4: Determining the Dimension $k$

- **Choosing $k$:** The dimension $k$ of the target space is selected based on the number of points $n$ and the desired level of distance preservation $\varepsilon$. The relation is typically proportional to $\frac{\log n}{\varepsilon^2}$, which guarantees, with high probability, the preservation of all pairwise distances among the $n$ points within the factor of $(1 \pm \varepsilon)$ after projection.

## Step 5: Probabilistic Guarantees

- **Probabilistic Analysis:** The proof concludes with a probabilistic analysis, showing that the random projection $\Phi$ is likely to satisfy the distance preservation property for all pairs of points in the set. This involves demonstrating that the failure probability for any single pair of points is low and using a union bound to argue that the overall probability of any distance not being preserved as desired is also low.

## Conclusion

This proof utilizes random projections and probabilistic arguments to show the existence of a linear projection that can reduce the dimensionality of a space while approximately preserving pairwise distances. It highlights the feasibility and practicality of achieving such embeddings with an appropriate choice of parameters.


# Probabilistic Guarantee of Johnson-Lindenstrauss Lemma Using Hoeffding's Inequality

To establish the probabilistic guarantee for the Johnson-Lindenstrauss Lemma, we use Hoeffding's Inequality to bound the probability that the distance between any two points significantly deviates after projection to a lower-dimensional space. The core idea involves showing that the projected distances are concentrated around their expected values.

## Hoeffding's Inequality

Hoeffding's Inequality gives us a powerful tool to estimate the probability of large deviations for the sum of independent bounded random variables. Specifically, for $n$ independent random variables $X_1, X_2, ..., X_n$ with bounds $a_i \leq X_i \leq b_i$, the sum $S_n = \sum_{i=1}^{n} X_i$ satisfies:

$$
\Pr\left(|S_n - \mathbb{E}[S_n]| \geq t\right) \leq 2\exp\left(-\frac{2t^2}{\sum_{i=1}^{n}(b_i - a_i)^2}\right)
$$

for any $t > 0$.

## Application in Johnson-Lindenstrauss Lemma

Consider two points $u, v$ in $\mathbb{R}^d$ and their projection into $\mathbb{R}^k$ using a random projection matrix $\Phi$, with each entry $\Phi_{ij} \sim \mathcal{N}(0, 1/k)$. Our goal is to show that the squared Euclidean distance between $u$ and $v$ is approximately preserved in the projected space with high probability.

### Step 1: Chi-Squared Distribution Variables

The squared distance in the projected space can be expressed as a sum of $k$ variables, each corresponding to the squared difference in a single dimension after projection. Specifically, if $X_i = (\Phi_i(u) - \Phi_i(v))^2$ for the $i$-th component after projection, then $\|f(u) - f(v)\|^2 = \sum_{i=1}^{k} X_i$. Each $X_i$ follows a scaled chi-squared distribution because it is the square of a Gaussian random variable.

### Step 2: Expected Value

The expected value of the squared distance after projection is $\mathbb{E}[\|f(u) - f(v)\|^2] = \|u - v\|^2$. This follows from the linearity of expectation and the properties of the Gaussian distribution used in the projection.

### Step 3: Applying Hoeffding's Inequality

To apply Hoeffding's Inequality, we note that each $X_i$ is bounded (as the projection of a Gaussian is also Gaussian with bounded variance). We are interested in bounding the probability that $\|f(u) - f(v)\|^2$ deviates from its expected value $\|u - v\|^2$ by more than an $\varepsilon$-proportion, i.e., we set $t = \varepsilon\|u - v\|^2$.

### Step 4: Probabilistic Bound

Substituting into Hoeffding's formula, we derive that:

$$
\Pr\left((1 - \varepsilon)\|u - v\|^2 \leq \|f(u) - f(v)\|^2 \leq (1 + \varepsilon)\|u - v\|^2\right) \geq 1 - 2\exp\left(-C\varepsilon^2k\right)
$$

where $C$ is a constant that depends on the bounds of $X_i$. This shows that the probability of a significant deviation decreases exponentially with $k$, the dimension of the projected space.

## Conclusion

Using Hoeffding's Inequality and considering the sum of chi-squared distribution variables, we have shown that with a suitably chosen dimension $k$, the Johnson-Lindenstrauss Lemma provides a high probabilistic guarantee that the distances between points in a high-dimensional space are approximately preserved after projection to a lower-dimensional space. This probabilistic guarantee underpins the lemma's utility in dimensionality reduction and data analysis.


In [1]:
import numpy as np
from scipy.linalg import hadamard
from scipy.stats import ortho_group
from numpy.random import default_rng

def jl_projection_matrix(d, k):
    """
    Generate a Johnson-Lindenstrauss random projection matrix.
    d: original dimension
    k: target dimension
    """
    rng = default_rng()
    if d < k:
        # If the original dimension is less than the target, use an orthogonal matrix
        H = ortho_group.rvs(dim=d, random_state=rng)
        D = np.diag(rng.choice([-1, 1], size=d))
        return (H @ D)[:k, :]
    else:
        # Otherwise, use a scaled random Gaussian matrix
        return rng.normal(0, 1 / np.sqrt(k), size=(k, d))

def jl_transform(X, k):
    """
    Apply the Johnson-Lindenstrauss transform to a dataset X.
    X: numpy array of shape (n_samples, n_features), the dataset to transform.
    k: target dimension.
    """
    n_samples, n_features = X.shape
    R = jl_projection_matrix(n_features, k)
    return X @ R.T

def distance_preservation_test(X, k, epsilon=0.1):
    """
    Test if distances are preserved after Johnson-Lindenstrauss transformation.
    X: numpy array of shape (n_samples, n_features), the dataset.
    k: target dimension.
    epsilon: tolerance for distance preservation.
    """
    # Original distances
    pairwise_distances_original = np.sqrt(np.sum((X[:, np.newaxis, :] - X[np.newaxis, :, :]) ** 2, axis=-1))

    # Transformed distances
    X_transformed = jl_transform(X, k)
    pairwise_distances_transformed = np.sqrt(np.sum((X_transformed[:, np.newaxis, :] - X_transformed[np.newaxis, :, :]) ** 2, axis=-1))

    # Check if distances are preserved within (1 +- epsilon)
    preservation_matrix = (1 - epsilon) * pairwise_distances_original <= pairwise_distances_transformed
    preservation_matrix &= pairwise_distances_transformed <= (1 + epsilon) * pairwise_distances_original

    # Return the fraction of distances that are preserved
    return np.mean(preservation_matrix)

# Example usage
n_samples = 100  # Number of points
n_features = 100000  # Original dimension
epsilon = 0.1  # Tolerance for distance preservation
k = int(4 * np.log(n_samples) / (epsilon**2 / 2 - epsilon**3 / 3))

# Generate a random dataset
X = np.random.rand(n_samples, n_features)

# Test distance preservation
fraction_preserved = distance_preservation_test(X, k, epsilon)
print(f"Original dimension: {n_features}")
print(f"Target dimension: {k}")
print(f"Fraction of distances preserved within (1 ± {epsilon}): {fraction_preserved}")


Original dimension: 100000
Target dimension: 3947
Fraction of distances preserved within (1 ± 0.1): 1.0
