## Tumor Elastic Modulus Prediction using MechanoGEPred
This tutorial demonstrates how to predict tumor elastic modulus using gene expression data from TCGA-COAD samples with the MechanoGEPred model.

### 1. Load MechanoGEPred Model
**Load trained model and required resources**

In [1]:
import os
import numpy as np
import pandas as pd
import joblib

# Change working directory
os.chdir("/mnt/Storage/home/zhouxiaoyan/github/MechanoGEPred")

# Verify model paths
model_path = 'model/model.joblib'
scaler_path = 'model/scaler.joblib'
gene_list_path = 'data/mechano_genes_list.txt'

if not all(os.path.exists(p) for p in [model_path, scaler_path, gene_list_path]):
    raise FileNotFoundError("Required model files missing. Check repository structure")

# Load model components
model = joblib.load(model_path)
scaler = joblib.load(scaler_path)
mechanosensitive_genes = pd.read_table(gene_list_path, header=None)[0].tolist()

print(f"Loaded model with {len(mechanosensitive_genes)} mechanosensitive genes")

Loaded model with 344 mechanosensitive genes


### 2. Load TCGA Dataset
**Import and validate gene expression data**

In [2]:
data_path = 'data/example_data.csv'

if not os.path.exists(data_path):
    raise FileNotFoundError(f"Data file {data_path} not found")

tcga_data = pd.read_csv(data_path, index_col=0)

# Data validation
print(f"Loaded {tcga_data.shape[0]} samples with {tcga_data.shape[1]} features")

Loaded 453 samples with 343 features


### 3. Data Preprocessing
**Prepare data for prediction**

Steps:
1. Select mechanosensitive genes
2. Handle missing genes (zero-fill)
3. Apply feature scaling

In [3]:
# Select required genes and handle missing values
try:
    processed_data = tcga_data.reindex(columns=mechanosensitive_genes, fill_value=0)
except KeyError as e:
    print(f"Critical gene missing: {e}")
    raise

# Apply standardization
# Bulk RNA-seq: use the scaler from model training for consistency
scaled_data = scaler.transform(processed_data)
print(f"Data preprocessing complete. Matrix shape: {scaled_data.shape}")

# if using single-cell RNA-seq: refit a new scaler due to distributional differences
# scaler = StandardScaler()
# scaled_data = scaler.fit_transform(processed_data)

Data preprocessing complete. Matrix shape: (453, 344)


### 4. Elastic Modulus Prediction
**Run model inference**

In [4]:
predictions = model.predict(scaled_data)
print(f"Generated {len(predictions)} predictions")
print(f"Prediction range: {np.min(predictions):.2f}-{np.max(predictions):.2f} kPa")

Generated 453 predictions
Prediction range: 0.68-3.31 kPa


### 5. Results Analysis
**Explore and save prediction results**

In [5]:
# Create results dataframe
results = pd.DataFrame(predictions, columns=['Predicted_Modulus_kPa'])

# Statistical summary
print("\nPrediction Statistics:")
print(results['Predicted_Modulus_kPa'].describe())

# Save results
output_dir = 'output'
results.to_csv(f'{output_dir}/TCGA_COAD_predictions.csv', index=False)
print(f"\nResults saved to: {output_dir}/TCGA_COAD_predictions.csv")


Prediction Statistics:
count    453.000000
mean       1.439787
std        0.441944
min        0.680361
25%        1.161718
50%        1.290134
75%        1.549432
max        3.306985
Name: Predicted_Modulus_kPa, dtype: float64

Results saved to: output/TCGA_COAD_predictions.csv
