# JWST MAST Data Download

Download JWST data products from the MAST archive for a specified proposal ID.

## How to Use

1. **Run Cell 2** - Imports and helper functions
2. **Edit Cell 3** - Set your configuration:
   - `PROPOSAL_ID` - JWST proposal/program number
   - `PRODUCT_LEVEL` - Choose `"uncal"`, `"rate"`, or `"cal"` (stage 3)
   - `INSTRUMENT` - Filter by instrument or `None` for all
   - `OUTPUT_DIR` - Download destination
   - `PUBLIC_ONLY` - Whether to download only public data
3. **Run Cell 4** (optional) - Quick test with limited results
4. **Run Cell 5** - Execute the full download

## JWST Data Product Levels

- **`uncal`** - Stage 0: Uncalibrated detector data
- **`rate`** - Stage 1: Detector-level calibrated data (count rate images)
- **`cal`** - Stage 2/3: Fully calibrated science products

## Instrument Names

MAST uses specific instrument names with modes:
- `NIRCAM/IMAGE` - NIRCam Imaging
- `NIRSPEC/MSA` - NIRSpec Multi-Shutter Array
- `NIRSPEC/IFU` - NIRSpec Integral Field Unit
- `NIRISS` - NIRISS
- `MIRI/IMAGE` - MIRI Imaging
- `FGS` - Fine Guidance Sensor

**Tip:** Set `INSTRUMENT = None` to download all instruments.

## Tips for Large Datasets

- Start with `MAX_ROWS = 10` to test
- Use `USE_CURL = True` for more robust downloads
- Consider downloading in batches

In [1]:
"""
JWST MAST Data Download - Helper Functions
"""
import os
import sys
import time
from pathlib import Path
from astroquery.mast import Observations, Mast


def mast_login_if_needed(token: str | None):
    """Authenticate with MAST if token provided."""
    if token:
        Mast.login(token=token)


def query_jwst_observations(proposal_id: str, instrument: str | None = None, max_rows: int | None = None):
    """
    Query MAST for JWST observations by proposal ID.
    
    Args:
        proposal_id: JWST proposal/program number
        instrument: Instrument name (e.g., "NIRCAM/IMAGE") or None for all
        max_rows: Maximum number of observations to return
        
    Returns:
        Astropy Table of observations
    """
    criteria = {
        "obs_collection": "JWST",
        "proposal_id": str(proposal_id),
    }
    
    if instrument:
        criteria["instrument_name"] = instrument
        print(f"   Querying with: {criteria}")
        obs = Observations.query_criteria(**criteria)
        
        # If no results and simple name given, try pattern matching
        if len(obs) == 0 and '/' not in instrument:
            print(f"   No exact match for '{instrument}', trying pattern match...")
            criteria.pop("instrument_name")
            obs = Observations.query_criteria(**criteria)
            if len(obs) > 0:
                mask = [instrument.upper() in str(inst).upper() for inst in obs['instrument_name']]
                obs = obs[mask]
                print(f"   Found {len(obs)} observations matching '{instrument}'")
    else:
        print(f"   Querying with: {criteria}")
        obs = Observations.query_criteria(**criteria)
    
    if max_rows is not None and len(obs) > max_rows:
        print(f"   Limiting from {len(obs)} to {max_rows} observations")
        obs = obs[:max_rows]
    
    return obs


def filter_products_by_level(products, product_level: str):
    """
    Filter products by JWST processing level.
    
    Args:
        products: Product table from MAST
        product_level: One of "uncal", "rate", "cal", "i2d", or "all"
        
    Returns:
        Filtered product table
    """
    if len(products) == 0 or product_level == "all":
        return products
    
    # Define suffix patterns for each level
    suffix_map = {
        "uncal": "_uncal.fits",      # Stage 0
        "rate": "_rate.fits",         # Stage 1
        "rateints": "_rateints.fits", # Stage 1 (integrations)
        "cal": "_cal.fits",           # Stage 2
        "i2d": "_i2d.fits",           # Stage 3 (mosaics)
        "x1d": "_x1d.fits",           # Stage 2 (spectra)
        "s2d": "_s2d.fits",           # Stage 2 (2D spectra)
        "s3d": "_s3d.fits",           # Stage 3 (3D spectra)
    }
    
    # Handle flexible level specifications
    if product_level.lower() in suffix_map:
        suffix = suffix_map[product_level.lower()]
        filename_match = [str(fn).endswith(suffix) for fn in products["productFilename"]]
        filtered = products[filename_match]
    else:
        # Try matching by productSubGroupDescription
        if "productSubGroupDescription" in products.colnames:
            subgroup_match = [str(x).upper() == product_level.upper() 
                            for x in products["productSubGroupDescription"]]
            filtered = products[subgroup_match]
        else:
            print(f"   Warning: Unknown product level '{product_level}', returning all products")
            return products
    
    # Restrict to science-like products
    if "productType" in filtered.colnames and len(filtered) > 0:
        sci_mask = [("SCIENCE" in str(t).upper() or "AUXILIARY" in str(t).upper()) 
                   for t in filtered["productType"]]
        filtered = filtered[sci_mask]
    
    return filtered


def download_products(products, outdir: Path, curl: bool = False, retries: int = 3):
    """
    Download MAST products with retry logic.
    
    Args:
        products: Product table to download
        outdir: Output directory
        curl: Use curl for downloads (more robust)
        retries: Number of retry attempts
        
    Returns:
        Download manifest table
    """
    outdir = outdir.resolve()
    outdir.mkdir(parents=True, exist_ok=True)
    
    kwargs = {
        "download_dir": str(outdir),
        "curl_flag": curl,
        "cache": True,
        "mrp_only": False,
    }
    
    for attempt in range(1, retries + 1):
        try:
            manifest = Observations.download_products(products, **kwargs)
            return manifest
        except Exception as exc:
            if attempt < retries:
                print(f"   [WARNING] Attempt {attempt} failed: {exc}")
                print(f"   Retrying in {2.5 * attempt}s...")
                time.sleep(2.5 * attempt)
            else:
                raise exc
    
    raise RuntimeError("Download failed after all retry attempts")

In [5]:
# ================================================================
# USER CONFIGURATION
# ================================================================

# JWST proposal/program ID
PROPOSAL_ID = "2136"

# Product level to download:
#   "uncal"  - Stage 0: Uncalibrated detector data
#   "rate"   - Stage 1: Calibrated count rate images
#   "cal"    - Stage 2: Fully calibrated science products
#   "i2d"    - Stage 3: Mosaicked images
#   "x1d"    - Stage 2: Extracted 1D spectra
#   "all"    - Download all available products
PRODUCT_LEVEL = "uncal"

# Output directory
OUTPUT_DIR = "./jwst_data"

# Instrument filter (None = all instruments)
# Examples: None, "NIRCAM/IMAGE", "NIRSPEC/MSA", "NIRSPEC/IFU", "MIRI/IMAGE"
INSTRUMENT = None

# Download only public data (no authentication required)
PUBLIC_ONLY = True

# MAST API token (for proprietary data)
# Get token from: https://auth.mast.stsci.edu/token
MAST_TOKEN = None

# Limit number of observations (None = no limit)
# Recommended: Start with 10 for testing large datasets
MAX_ROWS = None

# Use curl for downloads (more robust on unstable networks)
USE_CURL = False

# ================================================================

In [6]:
# ============================================================
# QUICK TEST: Verify the fixed query works
# ============================================================

print(f"Testing query for proposal {PROPOSAL_ID}...")
print(f"Instrument filter: {INSTRUMENT if INSTRUMENT else 'All instruments'}")
print("-" * 60)

# Test the query with the fixed function
test_obs = query_jwst_observations(PROPOSAL_ID, INSTRUMENT, max_rows=5)
print(f"\n✓ Query successful! Found {len(test_obs)} observations (limited to 5 for testing)")

if len(test_obs) > 0:
    print(f"\nSample observations:")
    for i in range(min(3, len(test_obs))):
        print(f"  {i+1}. {test_obs['obs_id'][i]}")
        print(f"     Instrument: {test_obs['instrument_name'][i]}")
        print(f"     Target: {test_obs['target_name'][i]}")
    
    # Test getting products for just the first observation
    print(f"\nGetting products for first observation...")
    test_prods = Observations.get_product_list(test_obs[:1])
    print(f"  Found {len(test_prods)} total products")
    
    # Filter for uncal
    test_uncal = filter_uncal_products(test_prods)
    print(f"  Found {len(test_uncal)} uncal products")
    
    if len(test_uncal) > 0:
        print(f"\nSample uncal files:")
        for i in range(min(3, len(test_uncal))):
            print(f"  - {test_uncal['productFilename'][i]}")
            print(f"    Size: {test_uncal['size'][i] / 1e6:.1f} MB")
        
print("\n" + "=" * 60)
print("✓ Test complete! The query is working correctly.")
print("You can now run the full download below.")
print("=" * 60)

Testing query for proposal 2136...
Instrument filter: All instruments
------------------------------------------------------------
   Querying with: {'obs_collection': 'JWST', 'proposal_id': '2136'}
   Limiting from 766 to 5 observations

✓ Query successful! Found 5 observations (limited to 5 for testing)

Sample observations:
  1. jw02136-o001_s03198_nirspec_f100lp-g140h
     Instrument: NIRSPEC/MSA
     Target: CAT.LMASSGT9P0.LSFRGT-UPDATED
  2. jw02136-o001_s01751_nirspec_f100lp-g140h
     Instrument: NIRSPEC/MSA
     Target: CAT.LMASSGT9P0.LSFRGT-UPDATED
  3. jw02136-o001_s02406_nirspec_f100lp-g140h
     Instrument: NIRSPEC/MSA
     Target: CAT.LMASSGT9P0.LSFRGT-UPDATED

Getting products for first observation...
   Limiting from 766 to 5 observations

✓ Query successful! Found 5 observations (limited to 5 for testing)

Sample observations:
  1. jw02136-o001_s03198_nirspec_f100lp-g140h
     Instrument: NIRSPEC/MSA
     Target: CAT.LMASSGT9P0.LSFRGT-UPDATED
  2. jw02136-o001_s01751_n

In [None]:
# ================================================================
# EXECUTE FULL DOWNLOAD
# ================================================================

print(f"JWST Data Download")
print(f"Proposal ID: {PROPOSAL_ID}")
print(f"Product Level: {PRODUCT_LEVEL}")
print(f"Instrument: {INSTRUMENT if INSTRUMENT else 'All'}")
print(f"Output: {OUTPUT_DIR}")
print(f"Public Only: {PUBLIC_ONLY}")
print("=" * 60)

outdir = Path(OUTPUT_DIR)

# Authenticate if needed
mast_login_if_needed(None if PUBLIC_ONLY else MAST_TOKEN)
print("✓ Authentication complete\n")

# Step 1: Query observations
print(f"[1/4] Querying observations...")
obs = query_jwst_observations(PROPOSAL_ID, INSTRUMENT, MAX_ROWS)

if len(obs) == 0:
    print("\n⚠ No observations found!")
    print("\nPossible reasons:")
    print("  • Incorrect proposal ID")
    print("  • Instrument filter too restrictive")
    print("  • Data not yet released")
    sys.exit()

print(f"✓ Found {len(obs)} observations\n")

# Step 2: Get product list
print(f"[2/4] Fetching product list...")
print("   (This may take a while for large datasets)")

try:
    prods = Observations.get_product_list(obs)
    print(f"✓ Found {len(prods)} total products\n")
except Exception as e:
    print(f"\n⚠ Error fetching products: {e}")
    print("   Try setting MAX_ROWS to a smaller value (e.g., 10)")
    sys.exit()

# Step 3: Filter by product level
print(f"[3/4] Filtering for {PRODUCT_LEVEL.upper()} products...")
filtered_prods = filter_products_by_level(prods, PRODUCT_LEVEL)

if len(filtered_prods) == 0:
    print(f"\n⚠ No {PRODUCT_LEVEL.upper()} products found!")
    print(f"   Try a different PRODUCT_LEVEL setting.")
    sys.exit()

print(f"✓ Found {len(filtered_prods)} {PRODUCT_LEVEL.upper()} products")

# Filter for public data if requested
if PUBLIC_ONLY and "dataRights" in filtered_prods.colnames:
    print("   Filtering for public data only...")
    is_public = [str(dr).upper() == "PUBLIC" for dr in filtered_prods["dataRights"]]
    filtered_prods = filtered_prods[is_public]
    
    if len(filtered_prods) == 0:
        print(f"\n⚠ All {PRODUCT_LEVEL.upper()} products are proprietary!")
        print("   Set PUBLIC_ONLY=False and provide MAST_TOKEN to download.")
        sys.exit()
    
    print(f"✓ {len(filtered_prods)} public products available")

# Calculate total size
total_size_gb = sum(filtered_prods['size']) / 1e9
print(f"   Total download size: {total_size_gb:.2f} GB\n")

# Step 4: Download
print(f"[4/4] Downloading {len(filtered_prods)} files to {outdir.resolve()}...")
print("   This may take a while depending on size and network speed...\n")

try:
    manifest = download_products(filtered_prods, outdir, curl=USE_CURL)
    
    print("\n" + "=" * 60)
    print("✓ DOWNLOAD COMPLETE!")
    print("=" * 60)
    print(f"Files downloaded: {len(manifest)}")
    print(f"Location: {outdir.resolve()}")
    
    # Show sample files
    if len(manifest) > 0:
        print(f"\nSample downloaded files:")
        for i in range(min(3, len(manifest))):
            print(f"  • {Path(manifest['Local Path'][i]).name}")
    
    print("\n" + "=" * 60)
    
except Exception as e:
    print(f"\n⚠ Download failed: {e}")
    print("   Try setting USE_CURL=True for more robust downloads")

print("\nProcess complete.")

[INFO] Starting JWST uncal data download for proposal 2565
[INFO] Instrument filter: NIRSpec
[INFO] Output directory: ./jwst_uncal
[INFO] Public only: True
------------------------------------------------------------
[INFO] Authentication check complete
[INFO] Querying JWST observations for proposal 2565...

[INFO] Process complete.

[INFO] Process complete.




## Example: Testing Different Product Levels

Run the cell below to see what products are available at each processing level for your proposal.

In [7]:
# ================================================================
# OPTIONAL: Compare product counts at different levels
# ================================================================

print(f"Comparing product levels for proposal {PROPOSAL_ID}")
print("=" * 60)

# Get a small sample of observations
sample_obs = query_jwst_observations(PROPOSAL_ID, INSTRUMENT, max_rows=3)
sample_prods = Observations.get_product_list(sample_obs[:1])

print(f"\nChecking first observation: {sample_obs['obs_id'][0]}")
print(f"Total products: {len(sample_prods)}\n")

# Check each product level
levels = ["uncal", "rate", "rateints", "cal", "i2d", "x1d", "s2d"]
results = {}

for level in levels:
    filtered = filter_products_by_level(sample_prods, level)
    count = len(filtered)
    results[level] = count
    if count > 0:
        total_size_mb = sum(filtered['size']) / 1e6
        print(f"  {level:10s} : {count:4d} files ({total_size_mb:8.1f} MB)")

print("\n" + "=" * 60)
print("Use these level names in the PRODUCT_LEVEL setting above.")
print("=" * 60)

Comparing product levels for proposal 2136
   Querying with: {'obs_collection': 'JWST', 'proposal_id': '2136'}
   Limiting from 766 to 3 observations

Checking first observation: jw02136-o001_s03198_nirspec_f100lp-g140h
Total products: 50927

  uncal      : 16500 files (268705.3 MB)
  rate       :  132 files ( 11084.5 MB)
  rateints   :  132 files ( 11082.0 MB)
  cal        : 16501 files (233190.7 MB)
  x1d        :  126 files (  1829.7 MB)
  s2d        :  126 files (  8428.8 MB)

Use these level names in the PRODUCT_LEVEL setting above.


---

## Summary of Improvements

This notebook now provides:

✅ **Flexible Product Selection** - Choose from uncal, rate, cal, i2d, x1d, s2d, or all  
✅ **Clean, Organized Code** - Separated functions, config, and execution  
✅ **Smart Instrument Matching** - Handles both exact and pattern-based instrument names  
✅ **Better Error Handling** - Clear messages and retry logic  
✅ **Progress Tracking** - Step-by-step progress indicators  
✅ **Size Estimates** - Shows download sizes before starting  
✅ **Testing Tools** - Quick test cells to verify before downloading large datasets

### Quick Reference

**Product Levels:**
- `uncal` - Raw detector data (largest files)
- `rate` - Calibrated detector images
- `cal` - Fully calibrated science products
- `i2d` - Mosaicked images (stage 3)
- `x1d` - Extracted 1D spectra
- `s2d` - 2D spectral products

**Workflow:**
1. Set `PROPOSAL_ID` and `PRODUCT_LEVEL` in Cell 3
2. Run Cell 4 to test (downloads nothing)
3. Run Cell 5 to download full dataset
4. Optional: Run Cell 7 to compare all product levels