# ProtSpace Google Colab Workflow

This workflow demonstrates how to use ProtSpace in Google Colab without local installation.


## Step 1: Set up the environment

First, we need to install the required dependencies and clone the ProtSpace repository.

In [None]:
# !pip install h5py numpy pandas scikit-learn umap-learn plotly dash dash-bio
!pip install umap-learn dash dash-bio
!git clone https://github.com/tsenoner/ProtSpace.git

## Step 2: Prepare sample data

For this example, we'll create some sample data. In a real scenario, you would use your own data files.

In [None]:
import h5py
import numpy as np
import pandas as pd

# Create sample embedding data
with h5py.File('sample_embeddings.h5', 'w') as f:
    for i in range(100):
        f[f'protein_{i}'] = np.random.rand(1024)

# Create sample feature data
feature_data = {
    'identifier': [f'protein_{i}' for i in range(100)],
    'feature1': np.random.choice(['A', 'B', 'C'], 100),
    'feature2': np.random.choice(['Z', 'Y'], 100)
}
pd.DataFrame(feature_data).to_csv('sample_features.csv', index=False)

## Step 3: Run the data preparation script

Now, let's run the `prepare_json.py` script to process our sample data.

In [None]:
!python ProtSpace/script/prepare_json.py -H sample_embeddings.h5 -c sample_features.csv -o sample_output.json --methods pca2 pca3 -v

## Step 4: Run the ProtSpace app

Finally, we'll run the ProtSpace app.

In [None]:
from ProtSpace.protspace.app import ProtSpace

json_file = 'sample_output.json'
protspace = ProtSpace(json_file)
protspace.run_server(port=8050)