# Batch Analysis with AlphaGenome

This notebook shows how to perform batch analysis of multiple genomic regions or variants.

## What you'll learn:
- Prepare batch input data (CSV files)
- Run predictions on multiple variants
- Monitor API quota during batch jobs
- Aggregate and visualize batch results

---

## 1. Import Libraries

In [None]:
# AlphaGenome librariesfrom alphagenome.data import genomefrom alphagenome.models import dna_client# Visualizationimport matplotlib.pyplot as plt# Our custom toolsimport syssys.path.insert(0, '/shared/tools')from alphagenome_tools import quick_plot, monitor_api_quotaprint("OK - All libraries imported successfully")

## 2. Connect to AlphaGenome and Check Quota

In [None]:
# Create model connection
model = dna_client.create()

print("✓ AlphaGenome connection established")

# Check API quota
monitor = monitor_api_quota()
print(f"\nAPI Status: {monitor}")
print(f"Remaining calls: {monitor.get_usage()['remaining']:,}")

## 3. Prepare Batch Input Data

### Option A: Use Example Data

In [None]:
# Load example variants from shared data
example_csv = Path('/shared/data/examples/known_variants.csv')

if example_csv.exists():
    variants = load_variants_from_csv(example_csv)
    print(f"✓ Loaded {len(variants)} example variants from {example_csv}")
else:
    # Create example variants
    print("Creating example variants for demonstration...")
    variants = [
        genome.Variant('chr22', 36201698, 'A', 'C'),
        genome.Variant('chr22', 36202000, 'G', 'T'),
        genome.Variant('chr22', 36202500, 'C', 'G'),
        genome.Variant('chr21', 33000000, 'A', 'G'),
        genome.Variant('chr21', 33000500, 'T', 'C'),
    ]
    print(f"✓ Created {len(variants)} example variants")

### Option B: Load Your Own CSV File

In [None]:
# To use your own variants, create a CSV with columns:
# chromosome,position,reference_bases,alternate_bases
#
# Example:
# chromosome,position,reference_bases,alternate_bases
# chr22,36201698,A,C
# chr22,36202000,G,T
#
# Then load it:

# variants = load_variants_from_csv('path/to/your/variants.csv')

print("Uncomment the code above to load your own CSV file")

### Option C: Create Variants Programmatically

In [None]:
# You can also create variants programmatically
# For example, generate variants in a specific region:

# def generate_variants_in_region(chrom, start, end, count=10):
#     variants = []
#     positions = np.random.randint(start, end, count)
#     for pos in positions:
#         variants.append(genome.Variant(
#             chromosome=chrom,
#             position=pos,
#             reference_bases='A',
#             alternate_bases='G'
#         ))
#     return variants

# variants = generate_variants_in_region('chr22', 36000000, 36500000, count=20)

print("Uncomment to generate variants programmatically")

## 4. Preview Your Variants

In [None]:
# Convert to DataFrame for easy viewing
variants_df = pd.DataFrame([{
    'chromosome': v.chromosome,
    'position': v.position,
    'reference': v.reference_bases,
    'alternate': v.alternate_bases
} for v in variants])

print(f"\nTotal variants to analyze: {len(variants)}")
print("\nFirst 5 variants:")
display(variants_df.head())

## 5. Run Batch Prediction

Now let's run predictions on all variants with progress tracking.

In [None]:
# Set analysis parameters
window_size = 100000  # 100kb window around each variant
ontology_terms = ['UBERON:0001157']  # Optional: tissue type
requested_outputs = [dna_client.OutputType.RNA_SEQ]

print(f"Starting batch analysis of {len(variants)} variants...")
print(f"Window size: {window_size:,} bp")
print(f"This will use approximately {len(variants)} API calls")
print("\nPlease wait...\n")

# Run batch prediction
results_df = batch_predict_variants(
    variants=variants,
    model=model,
    ontology_terms=ontology_terms,
    requested_outputs=requested_outputs,
    show_progress=True,  # Show progress bar
    monitor=True  # Track API usage
)

print("\n✓ Batch analysis complete!")

## 6. Check Results and API Usage

In [None]:
# Display summary statistics
print("\n=== Batch Analysis Summary ===")
print(f"Total variants: {len(results_df)}")
print(f"Successful: {results_df['success'].sum()}")
print(f"Failed: {(~results_df['success']).sum()}")

# Check API quota
monitor = monitor_api_quota()
print(f"\n{monitor}")

# Show failed predictions if any
if (~results_df['success']).any():
    print("\nFailed predictions:")
    display(results_df[~results_df['success']])

## 7. Visualize Batch Results

In [None]:
# Create batch summary visualization
fig = plot_batch_summary(results_df, metric='success', figsize=(10, 6))

if fig:
    plt.show()

# Chromosome distribution
fig2 = plot_batch_summary(results_df, metric='chromosome', figsize=(12, 6))
if fig2:
    plt.show()

## 8. Export Results

In [None]:
# Create results directory
results_dir = Path.home() / 'work' / 'results' / 'batch_analysis'
results_dir.mkdir(parents=True, exist_ok=True)

# Export to CSV
csv_path = results_dir / 'batch_results.csv'
export_to_csv(results_df, csv_path)
print(f"✓ CSV exported: {csv_path}")

# Export to Excel
excel_path = results_dir / 'batch_results.xlsx'
export_to_excel(results_df, excel_path)
print(f"✓ Excel exported: {excel_path}")

# Save figures
if fig:
    fig.savefig(results_dir / 'batch_summary.png', dpi=300, bbox_inches='tight')
    print(f"✓ Summary figure saved")

if fig2:
    fig2.savefig(results_dir / 'chromosome_distribution.png', dpi=300, bbox_inches='tight')
    print(f"✓ Distribution figure saved")

print(f"\nAll results saved to: {results_dir}")

## 9. Analyze Successful Predictions

In [None]:
# Filter successful predictions
successful = results_df[results_df['success']].copy()

print(f"Analyzing {len(successful)} successful predictions")
print("\nSummary statistics by chromosome:")
print(successful.groupby('chromosome').size())

## 10. Batch Interval Analysis (Optional)

In [None]:
# You can also batch analyze genomic intervals
# Define intervals to analyze

intervals = [
    genome.Interval('chr22', 35677410, 35777410),
    genome.Interval('chr22', 36000000, 36100000),
    genome.Interval('chr21', 33000000, 33100000),
]

print(f"Running batch sequence analysis on {len(intervals)} intervals...")

# Run batch prediction
interval_results = batch_predict_sequences(
    intervals=intervals,
    model=model,
    show_progress=True,
    monitor=True
)

print("\n✓ Interval analysis complete!")
display(interval_results)

## Summary

In this notebook, you learned how to:

✓ Prepare batch input data (CSV or programmatic)  
✓ Run predictions on multiple variants  
✓ Monitor API quota during batch jobs  
✓ Visualize batch results  
✓ Export results to multiple formats  

### Best Practices:
- Start with small batches (10-20 variants) to test
- Monitor API quota closely
- Save intermediate results frequently
- Use progress tracking for long-running jobs

### Related Notebooks:
- **02_variant_analysis.ipynb** - Single variant analysis
- **04_visualization.ipynb** - Advanced visualization