# OVRO-LWA Data Loading Examples

This notebook demonstrates how to use the `open_dataset()` function to load OVRO-LWA data from various sources.

In [None]:
import ovro_lwa_portal
import matplotlib.pyplot as plt
import numpy as np

## Example 1: Load from Local Path

The simplest use case - loading a zarr store from your local filesystem.

In [None]:
# Load a local zarr store
ds = ovro_lwa_portal.open_dataset("/path/to/observation.zarr")

# Display dataset info
print(ds)
print(f"\nDimensions: {dict(ds.dims)}")
print(f"Variables: {list(ds.data_vars.keys())}")

## Example 2: Load with Custom Chunking

For large datasets, customize chunking to optimize memory usage.

In [None]:
# Load with explicit chunk sizes
ds = ovro_lwa_portal.open_dataset(
    "/path/to/large_observation.zarr",
    chunks={"time": 100, "frequency": 50, "l": 512, "m": 512}
)

# Check chunking
print(f"SKY chunks: {ds.SKY.chunks}")

## Example 3: Load from Remote URL

Load data from cloud storage or HTTP/HTTPS URLs.

**Note**: Requires `ovro_lwa_portal[remote]` installation.

In [None]:
# Load from S3
ds = ovro_lwa_portal.open_dataset("s3://ovro-lwa-data/obs_12345.zarr")

# Load from HTTPS
ds = ovro_lwa_portal.open_dataset("https://data.ovro.caltech.edu/obs_12345.zarr")

# Load from Google Cloud Storage
ds = ovro_lwa_portal.open_dataset("gs://ovro-lwa-data/obs_12345.zarr")

## Example 4: Load via DOI

Load published datasets using their DOI identifier.

**Note**: Requires `ovro_lwa_portal[remote]` installation.

In [None]:
# Load via DOI (with prefix)
ds = ovro_lwa_portal.open_dataset("doi:10.5281/zenodo.1234567")

# Load via DOI (without prefix)
ds = ovro_lwa_portal.open_dataset("10.5281/zenodo.1234567")

## Example 5: Basic Data Analysis

Perform simple analysis on loaded data.

In [None]:
# Load data
ds = ovro_lwa_portal.open_dataset("/path/to/observation.zarr")

# Compute mean intensity over time
mean_intensity = ds.SKY.mean(dim="time")

# Plot
plt.figure(figsize=(10, 8))
mean_intensity.isel(frequency=0, polarization=0).plot()
plt.title("Mean Sky Intensity")
plt.xlabel("l (pixels)")
plt.ylabel("m (pixels)")
plt.show()

## Example 6: WCS-Aware Plotting

Create plots with celestial coordinates using the stored WCS information.

In [None]:
from astropy.wcs import WCS

# Load data
ds = ovro_lwa_portal.open_dataset("/path/to/observation.zarr")

# Reconstruct WCS from stored header
wcs_header_str = ds.attrs.get('fits_wcs_header') or ds.wcs_header_str.item().decode('utf-8')
wcs = WCS(wcs_header_str)

# Create WCS-aware plot
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection=wcs)
ax.imshow(ds.SKY.isel(time=0, frequency=0, polarization=0).values, origin='lower')
ax.set_xlabel('Right Ascension')
ax.set_ylabel('Declination')
ax.grid(color='white', ls='dotted')
plt.title('Sky Intensity with Celestial Coordinates')
plt.show()

## Example 7: Lazy Loading and Computation

Work with large datasets using dask's lazy loading.

In [None]:
# Load data lazily (default behavior)
ds = ovro_lwa_portal.open_dataset("/path/to/large_observation.zarr")

# Data is not loaded yet - this is fast
print(f"Dataset size: {ds.nbytes / 1e9:.2f} GB")

# Select a subset (still lazy)
subset = ds.sel(time=slice(0, 10), frequency=slice(0, 5))

# Compute only when needed
result = subset.SKY.mean().compute()
print(f"Mean intensity: {result.values}")

## Example 8: Parallel Processing with Dask

Use dask for parallel computation on large datasets.

In [None]:
from dask.distributed import Client

# Start dask client for parallel processing
client = Client()
print(client)

# Load data
ds = ovro_lwa_portal.open_dataset("/path/to/large_observation.zarr")

# Parallel computation
time_series = ds.SKY.mean(dim=["l", "m"]).compute()

# Plot time series
plt.figure(figsize=(12, 6))
time_series.isel(frequency=0, polarization=0).plot()
plt.title("Mean Intensity Time Series")
plt.xlabel("Time")
plt.ylabel("Intensity")
plt.show()

# Clean up
client.close()

## Example 9: Error Handling

Handle common errors gracefully.

In [None]:
from ovro_lwa_portal.io import DataSourceError

try:
    ds = ovro_lwa_portal.open_dataset("/nonexistent/path.zarr")
except FileNotFoundError as e:
    print(f"File not found: {e}")

try:
    ds = ovro_lwa_portal.open_dataset("doi:10.invalid/doi")
except DataSourceError as e:
    print(f"Failed to load data: {e}")

try:
    ds = ovro_lwa_portal.open_dataset("s3://bucket/data.zarr")
except ImportError as e:
    print(f"Missing dependency: {e}")
    print("Install with: pip install 'ovro_lwa_portal[remote]'")

## Example 10: Disable Validation

Skip validation for faster loading when working with non-standard data.

In [None]:
# Load without validation (faster)
ds = ovro_lwa_portal.open_dataset(
    "/path/to/observation.zarr",
    validate=False
)

print("Dataset loaded without validation")
print(ds)