# Stanford RNA 3D Folding 2 — Baseline | WandB Offline Sync via kaggle-wandb-sync

## Synced with kaggle-wandb-sync

This notebook uses **[kaggle-wandb-sync](https://pypi.org/project/kaggle-wandb-sync/)** to track experiments with Weights & Biases — even with internet disabled.

Since Kaggle competition notebooks run with internet **disabled**, you can't push W&B metrics in real time.
`kaggle-wandb-sync` solves this with a simple offline sync pipeline:

```
Notebook (WANDB_MODE=offline)
    → W&B logs saved to /kaggle/working/wandb/
    → kaggle kernels output  (download via GitHub Actions)
    → wandb sync             (push to W&B cloud)
```

### How to use

```bash
pip install kaggle-wandb-sync

# All-in-one: push notebook → poll → download output → wandb sync
export WANDB_API_KEY=your_api_key
kaggle-wandb-sync run stanford-rna-3d-folding-2/

# Or step by step:
kaggle-wandb-sync push  stanford-rna-3d-folding-2/
kaggle-wandb-sync poll  yasunorim/stanford-rna-3d-folding-2-baseline
kaggle-wandb-sync output yasunorim/stanford-rna-3d-folding-2-baseline
kaggle-wandb-sync sync  ./kaggle_output
```

→ **kaggle-wandb-sync GitHub**: https://github.com/yasumorishima/kaggle-wandb-sync  
→ **kaggle-wandb-sync PyPI**: https://pypi.org/project/kaggle-wandb-sync/  
→ **Notebook source**: https://github.com/yasumorishima/kaggle-competitions/tree/main/stanford-rna-3d-folding-2

---

## Competition Overview

**Task**: Predict the 3D structure of RNA molecules from sequence alone.  
**Metric**: TM-score (best of 5 predictions per sequence, averaged across targets)  
**Output**: x, y, z coordinates of the C1' atom for every nucleotide — 5 structures per sequence.

**This baseline**: Simple helical geometry — places nucleotides along an A-form RNA helix.  
All 5 predictions are identical (no diversity). Intended as a minimum working submission.

In [None]:
# W&B must be set to offline BEFORE importing wandb
import os
os.environ['WANDB_MODE'] = 'offline'
os.environ['WANDB_PROJECT'] = 'stanford-rna-3d-folding-2'
os.environ['WANDB_RUN_GROUP'] = 'baseline'

import numpy as np
import pandas as pd
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

import wandb

print('Libraries loaded.')
print(f'WANDB_MODE: {os.environ["WANDB_MODE"]}')

In [None]:
# --- Data path detection ---
INPUT_ROOT = Path('/kaggle/input')
SLUG = 'stanford-rna-3d-folding-2'

print('=== /kaggle/input/ structure ===')
for p in sorted(INPUT_ROOT.iterdir()):
    print(f'  {p.name}/')
    for sub in sorted(p.iterdir())[:5]:
        print(f'    {sub.name}')

# Auto-detect DATA_DIR
DATA_DIR = None
for p in INPUT_ROOT.rglob('test_sequences.csv'):
    DATA_DIR = p.parent
    break

if DATA_DIR is None:
    raise FileNotFoundError(f'test_sequences.csv not found under {INPUT_ROOT}')

print(f'\nDATA_DIR: {DATA_DIR}')
print('\nCSV files:')
for f in sorted(DATA_DIR.glob('*.csv')):
    print(f'  {f.name}')

## 1. Load Data

In [None]:
test_df = pd.read_csv(DATA_DIR / 'test_sequences.csv')
sample_sub = pd.read_csv(DATA_DIR / 'sample_submission.csv')
print(f'Test sequences:     {len(test_df)} rows')
print(f'Sample submission:  {len(sample_sub)} rows')
print(f'Test columns: {list(test_df.columns)}')
test_df.head(3)

In [None]:
# Sequence column for stats only
SEQ_COL = 'sequence' if 'sequence' in test_df.columns else test_df.columns[1]
ID_COL  = 'target_id' if 'target_id' in test_df.columns else test_df.columns[0]

seq_lengths = test_df[SEQ_COL].str.len()
print(f'ID column:       {ID_COL}')
print(f'Sequence column: {SEQ_COL}')
print(f'Sequence length — min: {seq_lengths.min()}, max: {seq_lengths.max()}, mean: {seq_lengths.mean():.1f}')

## 2. W&B Initialization

In [None]:
run = wandb.init(
    project='stanford-rna-3d-folding-2',
    name='baseline-v1',
    config={
        'approach': 'helical_geometry',
        'n_structures': 5,
        'helix_rise': 2.81,
        'helix_radius': 9.0,
        'helix_twist_deg': 32.7,
    }
)

wandb.log({
    'n_test_sequences': len(test_df),
    'seq_len_min': int(seq_lengths.min()),
    'seq_len_max': int(seq_lengths.max()),
    'seq_len_mean': float(seq_lengths.mean()),
})

print(f'W&B run: {run.name} (mode: {os.environ["WANDB_MODE"]})')

## 3. Baseline Prediction — A-form RNA Helix

Place each nucleotide along an idealized A-form RNA helix geometry:
- Rise per residue: 2.81 Å  
- Twist per residue: 32.7°  
- Helix radius: 9.0 Å

In [None]:
def helix_coords(seq_len: int, rise: float = 2.81, radius: float = 9.0, twist_deg: float = 32.7) -> np.ndarray:
    """Generate C1'  coordinates for an idealized A-form RNA helix. Returns (seq_len, 3)."""
    twist_rad = np.radians(twist_deg)
    indices   = np.arange(seq_len)
    x = radius * np.cos(indices * twist_rad)
    y = radius * np.sin(indices * twist_rad)
    z = indices * rise
    return np.stack([x, y, z], axis=1)


# Use sample_submission as template — guarantees correct ID format and row count
submission = sample_sub.copy()

# Extract target_id from ID column (format: target_id_resid e.g. 8ZNQ_1)
submission['_target'] = submission['ID'].str.rsplit('_', n=1).str[0]

for target_id, group in submission.groupby('_target', sort=False):
    seq_len = len(group)
    coords  = helix_coords(seq_len)
    idx     = group.index
    for s in range(1, 6):
        submission.loc[idx, f'x_{s}'] = coords[:, 0].round(3)
        submission.loc[idx, f'y_{s}'] = coords[:, 1].round(3)
        submission.loc[idx, f'z_{s}'] = coords[:, 2].round(3)

submission = submission.drop(columns=['_target'])
print(f'Submission shape: {submission.shape}')
print(f'ID examples: {submission["ID"].head(3).tolist()}')
submission.head(3)

## 4. Save Submission

In [None]:
OUTPUT_PATH = Path('/kaggle/working/submission.csv')
submission.to_csv(OUTPUT_PATH, index=False)
print(f'Saved: {OUTPUT_PATH}  ({OUTPUT_PATH.stat().st_size / 1024:.1f} KB)')

wandb.log({
    'n_submission_rows': len(submission),
    'submission_columns': len(submission.columns),
})

wandb.finish()
print('\nW&B run finished (offline). Sync with:')
print('  kaggle-wandb-sync run stanford-rna-3d-folding-2/ --skip-push')

## Summary

| Step | Detail |
|---|---|
| Approach | A-form RNA helix geometry |
| Structures | 5 (identical) |
| W&B mode | offline → synced via kaggle-wandb-sync |

### Sync W&B runs after execution

```bash
# After kaggle kernels push:
kaggle-wandb-sync run stanford-rna-3d-folding-2/

# Or to re-sync without re-running:
kaggle-wandb-sync run stanford-rna-3d-folding-2/ --skip-push
```