# Running `fftvis` with the GPU Backend

In this notebook, we'll demonstrate how to use `fftvis` with GPU acceleration to simulate visibilities. The GPU backend can significantly accelerate simulations, especially for large catalogs of point sources or extended observations with many frequencies and time steps.

<div class="alert alert-info">

__Note__

This tutorial requires a CUDA-capable GPU and the `cupy` package installed. If you don't have a compatible GPU, you can use the CPU backend instead.
</div>

<div class="alert alert-warning">

__Warning__

Before running this tutorial, make sure you have installed `fftvis` with GPU support. You can check if your GPU is properly configured by running a simple test with `cupy`.
</div>

In [1]:
# Check if GPU is available
try:
    import cupy as cp
    print(f"CuPy version: {cp.__version__}")
    print(f"CUDA version: {cp.cuda.runtime.runtimeGetVersion()}")
    print(f"Available GPU(s): {cp.cuda.runtime.getDeviceCount()}")
    for i in range(cp.cuda.runtime.getDeviceCount()):
        dev = cp.cuda.Device(i)
        props = cp.cuda.runtime.getDeviceProperties(i)
        print(f"  Device {i}: {props['name'].decode()}, Memory: {props['totalGlobalMem']/1e9:.2f} GB")
    gpu_available = True
except ImportError:
    print("CuPy is not installed. Please install it to use the GPU backend.")
    gpu_available = False
except cp.cuda.runtime.CUDARuntimeError:
    print("CUDA runtime error: No CUDA-capable device is detected or CUDA driver is not installed properly.")
    gpu_available = False

CuPy version: 13.4.1
CUDA version: 12080
Available GPU(s): 1
  Device 0: Quadro P600, Memory: 2.08 GB


In [2]:
# Standard imports
import numpy as np
import healpy as hp
from astropy.time import Time
import matplotlib.pyplot as plt
import time

# HERA-stack imports
import fftvis
from hera_sim.antpos import hex_array
from pyuvdata.telescopes import Telescope
from pyuvdata.analytic_beam import AiryBeam



## Setup Telescope / Observation Parameters

We'll set up the same observation parameters as in the CPU backend tutorial for easy comparison.

In [3]:
# Define antenna array positions
antpos = hex_array(3, split_core=True, outriggers=0)

In [4]:
# Define antenna beam using pyuvdata.analytic_beam.AiryBeam with a dish size of 14 meters
beam = AiryBeam(diameter=14.0)

In [5]:
# Define a list of frequencies in units of Hz
nfreqs = 20
freqs = np.linspace(100e6, 120e6, nfreqs)

In [6]:
# Define a list of times with an astropy time.Time object
ntimes = 30
times = Time(np.linspace(2459845, 2459845.05, ntimes), format='jd', scale='utc')

In [7]:
# Define the telescope location
telescope_loc = Telescope.from_known_telescopes('hera').location

## Setup Sky Model

In [8]:
# Set up a sky model using HEALPix
nside = 64
nsource = hp.nside2npix(nside)

# Get HEALPix pixel coordinates
dec, ra = hp.pix2ang(nside, np.arange(nsource))
dec -= np.pi / 2  # Convert from co-latitude to declination

# Define the flux of the sources as a function of frequency
flux = np.random.uniform(0, 1, nsource)                # flux of each source at 100MHz (in Jy)
alpha = np.ones(nsource) * -0.8                        # spectral index of each source

# Now get the (Nsource, Nfreq) array of the flux of each source at each frequency
flux_allfreq = ((freqs[:, np.newaxis] / freqs[0]) ** alpha.T * flux.T).T

## Run `fftvis` with GPU Backend

Unlike the CPU backend which uses `finufft` for non-uniform FFT operations, the GPU backend in `fftvis` uses `cufinufft` through CuPy. This allows for significant speedups, especially with large numbers of sources. 

To use the GPU backend, we simply specify `backend="gpu"` in the `simulate_vis` function call. All other parameters remain the same as in the CPU version.

In [9]:
# Define subset of baselines we're interested in for simulating
baselines = [(i, j) for i in range(len(antpos)) for j in range(len(antpos))]

In [10]:
%%time
# Simulate visibilities using the GPU backend
if gpu_available:
    vis_gpu = fftvis.simulate_vis(
        ants=antpos,
        fluxes=flux_allfreq,
        ra=ra,
        dec=dec,
        freqs=freqs,
        times=times.jd,
        telescope_loc=telescope_loc,
        beam=beam,
        polarized=False,
        precision=2,
        nprocesses=1,  # Use single process for GPU simulation
        baselines=baselines,
        backend="gpu"  # Use GPU backend
    )
else:
    print("GPU not available. Skipping GPU simulation.")

ValueError: GPU beam evaluation only supports UVBeam objects with interp method

## Compare GPU and CPU Backends

Let's run the same simulation with the CPU backend for comparison. We expect the results to be nearly identical, with the main difference being the computation time.

In [None]:
%%time
# Simulate visibilities using the CPU backend
vis_cpu = fftvis.simulate_vis(
    ants=antpos,
    fluxes=flux_allfreq,
    ra=ra,
    dec=dec,
    freqs=freqs,
    times=times.jd,
    telescope_loc=telescope_loc,
    beam=beam,
    polarized=False,
    precision=2,
    nprocesses=1,
    baselines=baselines,
    backend="cpu"  # Use CPU backend
)

In [None]:
# Check that results from GPU and CPU are equivalent
if gpu_available:
    # The results should be very close but not exactly the same due to floating-point differences
    print(f"Maximum absolute difference: {np.max(np.abs(vis_gpu - vis_cpu))}")
    print(f"Are GPU and CPU results close? {np.allclose(vis_gpu, vis_cpu, rtol=1e-5, atol=1e-7)}")

## Benchmark Performance: GPU vs CPU

Let's benchmark the performance difference between GPU and CPU backends with increasing number of sources. The GPU advantage typically becomes more apparent with larger datasets.

In [None]:
def benchmark_performance(nsides, ntimes=10, nfreqs=5):
    """Benchmark GPU vs CPU performance for different HEALPix nsides."""
    
    # Shorter time and frequency arrays for benchmarking
    short_freqs = np.linspace(100e6, 120e6, nfreqs)
    short_times = Time(np.linspace(2459845, 2459845.02, ntimes), format='jd', scale='utc')
    
    results = []
    
    for nside in nsides:
        nsource = hp.nside2npix(nside)
        print(f"Running benchmark with nside={nside}, nsource={nsource}")
        
        # Create sky model
        dec, ra = hp.pix2ang(nside, np.arange(nsource))
        dec -= np.pi / 2
        flux = np.random.uniform(0, 1, nsource)
        alpha = np.ones(nsource) * -0.8
        flux_allfreq = ((short_freqs[:, np.newaxis] / short_freqs[0]) ** alpha.T * flux.T).T
        
        # Time CPU simulation
        t0 = time.time()
        _ = fftvis.simulate_vis(
            ants=antpos,
            fluxes=flux_allfreq,
            ra=ra,
            dec=dec,
            freqs=short_freqs,
            times=short_times.jd,
            telescope_loc=telescope_loc,
            beam=beam,
            polarized=False,
            precision=2,
            nprocesses=1,
            baselines=baselines[:10],  # Use fewer baselines for speed
            backend="cpu"
        )
        cpu_time = time.time() - t0
        
        # Time GPU simulation
        gpu_time = None
        if gpu_available:
            t0 = time.time()
            _ = fftvis.simulate_vis(
                ants=antpos,
                fluxes=flux_allfreq,
                ra=ra,
                dec=dec,
                freqs=short_freqs,
                times=short_times.jd,
                telescope_loc=telescope_loc,
                beam=beam,
                polarized=False,
                precision=2,
                nprocesses=1,
                baselines=baselines[:10],  # Use fewer baselines for speed
                backend="gpu"
            )
            gpu_time = time.time() - t0
            # Clear GPU memory
            if 'cp' in globals():
                cp.cuda.runtime.deviceSynchronize()
                cp.get_default_memory_pool().free_all_blocks()
        
        results.append((nside, nsource, cpu_time, gpu_time))
    
    return results

In [None]:
# Run benchmarks with increasing HEALPix nside values
# Skip if GPU is not available
if gpu_available:
    benchmark_results = benchmark_performance([8, 16, 32, 64])
else:
    print("GPU not available. Skipping benchmarks.")

In [None]:
# Plot the benchmark results
if gpu_available and 'benchmark_results' in locals():
    nsides, nsources, cpu_times, gpu_times = zip(*benchmark_results)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))
    
    # Plot execution times
    ax1.plot(nsources, cpu_times, 'o-', label='CPU')
    ax1.plot(nsources, gpu_times, 's-', label='GPU')
    ax1.set_xlabel('Number of Sources')
    ax1.set_ylabel('Execution Time (s)')
    ax1.set_title('Execution Time Comparison')
    ax1.legend()
    ax1.grid(True)
    
    # Plot speedup ratios
    speedups = [cpu/gpu for cpu, gpu in zip(cpu_times, gpu_times)]
    ax2.plot(nsources, speedups, 'o-')
    ax2.set_xlabel('Number of Sources')
    ax2.set_ylabel('Speedup (CPU time / GPU time)')
    ax2.set_title('GPU Speedup Factor')
    ax2.grid(True)
    
    plt.tight_layout()
    plt.show()

## Plot Visibility Results

We'll plot the visibility amplitude and phase from the GPU simulation, similar to what we did in the CPU tutorial.

In [None]:
# Use GPU results if available, otherwise CPU results
vis_to_plot = vis_gpu if gpu_available else vis_cpu

fig, axs = plt.subplots(1, 2, figsize=(10, 6))
for bl_index, bl in enumerate(baselines[:3]):
    axs[0].semilogy(freqs / 1e6, np.abs(vis_to_plot[:, 0, bl_index]))
    axs[1].plot(freqs / 1e6, np.angle(vis_to_plot[:, 0, bl_index]), label=f"b = {bl[0]}")

axs[1].legend()
axs[0].set_xlabel('Frequency [MHz]')
axs[1].set_xlabel('Frequency [MHz]')
axs[0].set_ylabel('Amplitude [Jy]')
axs[1].set_ylabel('Phase [rad]')
axs[1].set_ylim(-np.pi * 1.1, np.pi * 1.1)
axs[0].grid()
axs[1].grid()
plt.show()

In [None]:
fig, axs = plt.subplots(1, 2, figsize=(10, 6))
for bl_index, bl in enumerate(baselines[:3]):
    axs[0].semilogy(times.unix - times.unix[0], np.abs(vis_to_plot[0, :, bl_index]))
    axs[1].plot(times.unix - times.unix[0], np.angle(vis_to_plot[0, :, bl_index]), label=f"b = {bl[0]}")

axs[0].set_xlabel('Times [s]')
axs[1].set_xlabel('Times [s]')
axs[0].set_ylabel('Amplitude [Jy]')
axs[1].set_ylabel('Phase [rad]')
axs[1].set_ylim(-np.pi * 1.1, np.pi * 1.1)
axs[0].grid()
axs[1].grid()
plt.legend()
plt.show()

## Conclusion

The GPU backend of `fftvis` provides a significant speedup compared to the CPU backend, especially for larger sky models with many sources. The main advantages are:

1. Accelerated non-uniform FFT operations using `cufinufft`
2. Parallel processing of source computations on the GPU
3. Efficient beam interpolation using GPU-accelerated map coordinates

When working with large simulations, the GPU backend is recommended if suitable hardware is available. For smaller simulations, the overhead of data transfer between CPU and GPU might reduce the performance advantage.

In [None]:
# Clean up GPU memory if we used it
if gpu_available and 'cp' in globals():
    cp.cuda.runtime.deviceSynchronize()
    cp.get_default_memory_pool().free_all_blocks()