# Variant Effect Analysis with AlphaGenome

This notebook demonstrates how to analyze the effects of genetic variants using AlphaGenome.

## What you'll learn:
- Define genetic variants (SNPs, insertions, deletions)
- Compare reference vs alternate alleles
- Calculate variant effect scores
- Generate comparison visualizations

---

## 1. Import Libraries and Setup

In [None]:
# AlphaGenome libraries
from alphagenome.data import genome
from alphagenome.models import dna_client

# Visualization
import matplotlib.pyplot as plt
import numpy as np

# Custom tools
import sys
sys.path.insert(0, '/shared/tools')
from alphagenome_tools import (
    plot_variant_comparison,
    monitor_api_quota,
    save_results
)

print("✓ All libraries imported successfully")

## 2. Connect to AlphaGenome

In [None]:
# Create model connection
model = dna_client.create()

print("✓ AlphaGenome connection established")

# Check API quota
monitor = monitor_api_quota()
print(f"API Status: {monitor}")

## 3. Define a Genetic Variant

Let's analyze a single nucleotide polymorphism (SNP) on chromosome 22.

In [None]:
# Define the variant
variant = genome.Variant(
    chromosome='chr22',
    position=36201698,
    reference_bases='A',  # Reference allele
    alternate_bases='C'   # Alternate allele
)

print(f"Variant: {variant.chromosome}:{variant.position}")
print(f"Reference: {variant.reference_bases}")
print(f"Alternate: {variant.alternate_bases}")
print(f"Type: SNP (Single Nucleotide Polymorphism)")

## 4. Define Genomic Context

We need to define the genomic interval around the variant for analysis.

In [None]:
# Create interval centered on the variant
# 100kb window for analysis
window_size = 100000

interval = genome.Interval(
    chromosome=variant.chromosome,
    start=max(0, variant.position - window_size // 2),
    end=variant.position + window_size // 2
)

print(f"Analysis interval: {interval.chromosome}:{interval.start:,}-{interval.end:,}")
print(f"Variant position: {variant.position:,} (centered)")

## 5. Run Variant Prediction

Now let's compare predictions for the reference and alternate alleles.

In [None]:
# Run variant prediction
print("Running variant prediction...")
print("This will generate predictions for both reference and alternate alleles")
print("Please wait...\n")

outputs = model.predict_variant(
    interval=interval,
    variant=variant,
    ontology_terms=['UBERON:0001157'],  # Optional: tissue ontology
    requested_outputs=[dna_client.OutputType.RNA_SEQ]
)

print("✓ Prediction complete!")

# Update API quota monitor
monitor_api_quota().increment()
print(f"\nAPI Status: {monitor_api_quota()}")

## 6. Compare Reference vs Alternate Alleles

In [None]:
# Extract reference and alternate outputs
ref_outputs = outputs.reference
alt_outputs = outputs.alternate

print("Reference allele outputs:")
print(f"  - Has RNA-Seq prediction: {hasattr(ref_outputs, 'rna_seq')}")

print("\nAlternate allele outputs:")
print(f"  - Has RNA-Seq prediction: {hasattr(alt_outputs, 'rna_seq')}")

## 7. Visualize Variant Effect

Create a side-by-side comparison of reference and alternate predictions.

In [None]:
# Generate comparison plot
fig = plot_variant_comparison(
    ref_outputs=ref_outputs,
    alt_outputs=alt_outputs,
    variant=variant,
    figsize=(16, 6)
)

if fig:
    plt.show()
else:
    print("Visualization unavailable")

## 8. Calculate Effect Scores

Let's quantify the difference between reference and alternate alleles.

In [None]:
# Calculate simple effect score
# Note: Actual implementation depends on AlphaGenome output structure

effect_score = {
    'variant': f"{variant.chromosome}:{variant.position}",
    'ref_allele': variant.reference_bases,
    'alt_allele': variant.alternate_bases,
    'predicted_effect': 'Unknown'  # Will be calculated from actual data
}

print("Variant Effect Summary:")
print(f"  Location: {effect_score['variant']}")
print(f"  Change: {effect_score['ref_allele']} → {effect_score['alt_allele']}")
print(f"  Effect: {effect_score['predicted_effect']}")

## 9. Save Analysis Results

In [None]:
from pathlib import Path
import json

# Create results directory
results_dir = Path.home() / 'work' / 'results' / 'variant_analysis'
results_dir.mkdir(parents=True, exist_ok=True)

# Save figure
if fig:
    fig_path = results_dir / 'variant_comparison.png'
    fig.savefig(fig_path, bbox_inches='tight', dpi=300)
    print(f"✓ Figure saved: {fig_path}")

# Save variant info
variant_info_path = results_dir / 'variant_info.json'
with open(variant_info_path, 'w') as f:
    json.dump(effect_score, f, indent=2)
print(f"✓ Variant info saved: {variant_info_path}")

print(f"\nAll results saved to: {results_dir}")

## 10. Try Your Own Variants

Copy and modify this code to analyze different variants:

In [None]:
# Example: Define your own variant
# my_variant = genome.Variant(
#     chromosome='chr22',      # Change chromosome
#     position=36202000,        # Change position
#     reference_bases='G',     # Change reference allele
#     alternate_bases='A'      # Change alternate allele
# )

# Then run the prediction with your variant
# outputs = model.predict_variant(...)

print("Ready to analyze custom variants!")

## Summary

In this notebook, you learned how to:

✓ Define genetic variants  
✓ Set up genomic context for analysis  
✓ Run variant effect predictions  
✓ Compare reference vs alternate alleles  
✓ Visualize variant effects  
✓ Save analysis results  

### Related Notebooks:
- **01_quickstart.ipynb** - Basic sequence prediction
- **03_batch_analysis.ipynb** - Analyze multiple variants at once
- **04_visualization.ipynb** - Advanced visualization techniques

### Tips:
- Always define an interval around your variant
- Larger intervals provide more context but take longer
- Monitor your API quota to avoid hitting limits