# Molecular Solubility Prediction with Graph Neural Networks

This notebook sets up the environment for running the modular solubility prediction project in Google Colab.

## 1. Clone the Repository

First, clone your repository. Replace the URL with your actual repository URL.

In [None]:
# Clone the repository
!git clone git@github.com:viv-bad/solubility-predictor.git

# Navigate into the repository directory 
%cd solubility-predictor

## 2. Set Up the Project for Colab

Run the setup script to install dependencies and fix import issues.

In [None]:
# Install dependencies from requirements.txt
!pip install -r requirements.txt -q

# Install the 'solpred' package itself (from pyproject.toml)
!pip install -e . -q

# Install PyG dependencies (scatter/sparse) - necessary for Colab
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

cuda_suffix = ""
if torch.cuda.is_available():
    try:
        cuda_version = torch.version.cuda
        if cuda_version:
            cuda_suffix = f"cu{cuda_version.replace('.', '')}"
        else:
            print("CUDA version detection failed, defaulting to CPU.")
            cuda_suffix = "cpu"
    except Exception as e:
        print(f"Error detecting CUDA version: {e}. Defaulting to CPU.")
        cuda_suffix = "cpu"
else:
    print("CUDA not available, using CPU.")
    cuda_suffix = "cpu"

pyg_whl_url = f"https://data.pyg.org/whl/torch-{torch.__version__}+{cuda_suffix}.html"
print(f"\nInstalling PyG dependencies (scatter/sparse) from {pyg_whl_url}...")
try:
    # Use pip install with the find-links option
    # !pip install torch-scatter torch-sparse -f "{pyg_whl_url}" --verbose
    print("\nPyG dependencies installed successfully.")
except Exception as e: # Catch broad exceptions as subprocess errors might not be specific
     print(f"\nError installing PyG dependencies: {e}")
     print("Please check the URL and compatibility:", pyg_whl_url)
     print("Attempting install without specific version link (might be slower or fail)...")
     try:
         !pip install torch-scatter torch-sparse -q
         print("Installed torch-scatter and torch-sparse without specific wheel URL.")
     except Exception as e2:
         print(f"Failed to install torch-scatter/torch-sparse: {e2}. Training might fail.")

print("\nDependency installation complete.")

In [None]:
# If data not in colab, mount drive and get data from there. NOTE: remember to adjust --data path in train script to point to data in drive
from google.colab import drive
drive.mount('/content/drive')

## 3. Test Molecule Visualization

Let's test if the molecule visualization works correctly.

In [None]:
import sys
import os

# Define the path to the src directory relative to the current working directory
# (which should be your repo root after %cd)
src_path = os.path.abspath('./src')

# Check if the path is already in sys.path, and add it if not
if src_path not in sys.path:
    print(f"Adding {src_path} to sys.path")
    sys.path.insert(0, src_path)
else:
    print(f"{src_path} already in sys.path")

In [None]:
# Test molecule visualization
from solpred.data.test_graph import visualize_molecule_graph

# Visualize aspirin
visualize_molecule_graph('CC(=O)OC1=CC=CC=C1C(=O)O')

## 4. Run the Training Script

Now we can run the training script with our fixed import structure.

In [None]:
# Create the default output directory if it doesn't exist
!mkdir -p models

# Run training (adjust parameters as needed)
# Assumes your data is in './data/raw/solubility_data.csv' within the repo
!python -m solpred.train \
    --data ./data \
    --output_dir ./models \
    --epochs 50 \
    --batch_size 64 \
    --lr 0.001 \
    --hidden_dim 128

# --- OR ---
# If using Google Drive (replace with your actual path):
# !python -m solpred.train \
#     --data /content/drive/MyDrive/path/to/your/data_folder \
#     --output_dir ./models \
#     --epochs 50 \
#     --batch_size 64