# Installation

## 1. Setting up Virtual Environment

Note: while other virtual environment setup (e.g., conda) and IDE integrations (VS Code) are possibly working, we didn't test it extensively.

We tested dimbridge in a virtual environment. 
For example, in a command line console, create an virtual environment `.venv`  under the current working directory
```
python -m venv .venv
```
and activate this environment
```
source .venv/bin/activate
```
Then install jupyter lab under this environment
```
pip install jupyterlab
```

## 2. Installing DimBridge

### Option 1: Install DimBridge from PyPI
```
pip install dimbridge
```

### Option 2: Install from source

Alternatively, clone `dimbridge-jupyter` repository
```
git clone https://github.com/tiga1231/dimbridge-jupyter.git
```

and dynamically link dimbridge source code in this git repo:
```
pip install -e ".[dev]"
```

## 3. Installing Plotting library and UMAP for dimensionality reduction plots
This example notebook also needs additional packages for ploting and computing dimensionality reduction plot (UMAP):
```
pip install matplotlib umap-learn
```

# Imports

In [None]:
# %load_ext autoreload
# %autoreload 2
# %env ANYWIDGET_HMR=1
# from glob import glob

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.animation import FuncAnimation
from umap import UMAP

from dimbridge import Dimbridge

# %matplotlib inline
plt.style.use("ggplot")
plt.style.use("seaborn-v0_8-colorblind")

# Example 1: Synthetic Data, 4D parametrization of a Klein bottle 
https://en.wikipedia.org/wiki/Klein_bottle#4-D_non-intersecting

## Generate a Klein bottle dataset

In [None]:
n = int(1e4)

## data
R = 2
P = 3
eps = 0.5
u = np.random.rand(n) * np.pi * 2
v = np.random.rand(n) * np.pi * 2

x = R * (np.cos(u / 2) * np.cos(v) - np.sin(u / 2) * np.sin(2 * v))
y = R * (np.sin(u / 2) * np.cos(v) + np.cos(u / 2) * np.sin(2 * v))
z = P * np.cos(u) * (1 + eps * np.sin(v))
w = P * np.sin(u) * (1 + eps * np.sin(v))

## construct pandas dataframe and compute UMAP
df = pd.DataFrame(dict(x1=x, x2=y, x3=z, x4=w))
# xy = UMAP(n_neighbors=50, min_dist=0.3).fit_transform(df.to_numpy())

# Here we will use the 2D parametrization of the bottle as our "dimensionality reduction" plot
xy = np.c_[u, v]

## validate UMAP
plt.figure(figsize=[12, 6], dpi=80)
plt.subplot(121)
plt.scatter(xy[:, 0], xy[:, 1], s=1)
plt.axis("equal")
plt.xlabel("u")
plt.ylabel("v")
plt.title("2D Parameter space")

## validate data
plt.subplot(122)
plt.scatter(df["x1"], df["x2"], s=1)
plt.axis("equal")
plt.xlabel("x1")
plt.ylabel("x2")
plt.title("First 2 coordinates of the 4D embedding space")
plt.show()

print("Data table")
df

## Using DimBridge on Klein bottle dataset

In [None]:
# for dev testing:
# from importlib import reload
# import dimbridge
# reload(dimbridge)

from dimbridge import Dimbridge

dimbridge = Dimbridge(
    data=df,  ## data table as a pandas DataFrame
    x=xy[:, 0],  ## x coordinate of DR plot
    y=xy[:, 1],  ## y coordinate of DR plot
    s=4,  # projection plot mark size
    splom_s=1,  # scatterplot matrix (SPLOM) mark size
    # "data extent" - displays min and max of selection in the predicate view,
    # "predicate regression" - uses ML method to balance false positives and false negatives
    predicate_mode="predicate regression",
    # Mode of the brush interaction on projection view, either 'single', "contrastive", or "curve"
    brush_mode="contrastive",
)
dimbridge

## Showing brush selected points

Now data selected from the DimBridge UI can be exported for downsteam analysis

Assuming `dimbridge` being the UI widget, `dimbridge.selected` will store a 2D array of `[n_boxes, n_points_in_dataset]` boolean values indicating which points are selected by the brush

- For single box selection, n_boxes = 1,
- For contrastive selection, n_boxes = 2,
- For curve brush, n_boxes = 12,

In [None]:
## Showing the selected points in the first box

print("dimbridge.selected shape", np.array(dimbridge.selected).shape)

selected = np.array(dimbridge.selected[0])
df[selected]

# Example 2: AI-generated animal images used in the paper (https://arxiv.org/abs/2404.07386)

- Download images of animals to the current directory from
    - https://drive.google.com/drive/folders/1x1Ptvpoay4YsM6IrtuDr11iYtkrv8nzI
- `unzip animals5_remote.zip`
- unzip it to a new directory, `static/` under the root of this repo so that the file structure looks like:
```
example.ipynb (this notebook)
static/
└── animals5_remote/
    ├── animals5.csv
    └── images/
        ├── animal-0.jpg
        ├── animal-1.jpg
```
- Start an HTTP server to serve the images
`python -m http.server 9001`
    - DimBridge still works if images were not correctly served, the bottom of the UI will display a set of not-found images
- Run the following cells

In [None]:
# dataset_name = "gait2"
# df = pd.read_csv(f"./datasets/{dataset_name}/{dataset_name}.csv")
# df = df.drop(columns=["x", "y"])
# for col in df.columns:
#     if df[col].dtype == "int64":
#         df[col] = df[col].astype("int32")
# # xy = np.c_[df["leg1.joint1.angle"].to_numpy(), df["leg1.joint2.angle"].to_numpy()]
# numeric_columns = [col for col in df.columns if df[col].dtype != "int32"]
# xy = UMAP(n_neighbors=30, min_dist=0.2).fit_transform(df[numeric_columns].to_numpy())


# dataset_name = "animals5_remote"
df2 = pd.read_csv("./static/animals5_remote/animals5.csv")
image_urls = df2["image_url"].to_list()

# either use the embedding stored in the dataset
xy2 = df2[["x", "y"]].to_numpy()
df2 = df2.drop(columns=["x", "y", "image_filename", "image_url"])
# or compute UMAP here:
# xy = UMAP(n_neighbors=50, min_dist=0.8).fit_transform(df.to_numpy())

df2.columns

In [None]:
## check the UMAP
plt.figure(figsize=[3, 3])
plt.scatter(xy2[:, 0], xy2[:, 1], s=1)
plt.axis("equal")
plt.show()

In [None]:
# for dev testing:
# from importlib import reload

# import dimbridge

# reload(dimbridge)
# from dimbridge import Dimbridge

dimbridge2 = Dimbridge(
    data=df2,
    image_urls=image_urls,
    x=xy2[:, 0],
    y=xy2[:, 1],
    s=4,  # projection plot mark size
    splom_s=2,  # SPLOM plot mark size
    predicate_mode="predicate regression",  # "data extent", "predicate regression"
    brush_mode="curve",  # 'single', "contrastive", "curve",
)

dimbridge2

## Getting selected data points from UI
Now data selected from the DimBridge UI can be exported for downsteam analysis

`dimbridge.selected` will store a 2D array of `[n_boxes, n_points]` boolean values

For single box selection, n_boxes = 1,
For contrastive selection, n_boxes = 2,
For curve brush, n_boxes = 12,


In [None]:
selected = np.array(dimbridge2.selected[0])
df2[selected]

In [None]:
## Plotting 2 given attributes of the selected points
plt.scatter(df2["furry"][selected], df2["excited"][selected])
plt.xlabel('furry')
plt.ylabel('excited')
plt.axis("equal")
plt.show()