# HPC Tutorial

Welcome to the **HPC (High-Performance Computing) Tutorial**. This guide will help you understand how to access NYU's Cloud Burst HPC cluster, manage your files, run interactive and batch jobs, set up environments, run jupyter notebooks and more.


## Table of Contents

1. [Logging In](#logging-in)
2. [Understanding the Filesystem](#understanding-the-filesystem)
3. [Running Interactive Jobs](#running-interactive-jobs)
4. [Setting Up Singularity and Conda](#setting-up-singularity-and-conda)
5. [Running Batch Jobs](#running-batch-jobs)
6. [SCP for copying files around](#copy-files)
7. [Running Jupyter notebooks](#jupyter)

---

In [None]:
srun --account=csci_ga_3033_077-2025sp --partition=n1s8-v100-1 --gres=gpu:1 --time=1:00:00 --pty /bin/bash
srun --account=csci_ga_3033_077-2025sp --partition=interactive --time=1:00:00 --pty /bin/bash
cd /scratch/yz5944
singularity exec --bind /scratch --nv --overlay  /scratch/yz5944/overlay-25GB-500K.ext3:rw /scratch/yz5944/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif /bin/bash

ps -f -u $USER | grep Singularity
kill -9 <PID>

source /ext3/env.sh
conda activate bdml_env

## Logging In

To access the Greene HPC cluster, you need to be on the NYU network. If you're off-campus, connect via the [NYU VPN](https://www.nyu.edu/life/information-technology/infrastructure/network-services/vpn.html).

### Steps to Log In

1. **Open a Terminal** on your local machine.

2. **Connect via SSH** (replace `yz5944` with your NYU NetID):

```bash
Local ---> Greene login node ---> Greene compute node (NOT USING FOR THIS COURSE)
                            |
                             ---> Burst node    ---> GCP compute node
```

```bash
ssh yz5944@greene.hpc.nyu.edu
ssh burst
```


## Understanding the Filesystem

The Greene HPC cluster has different directories optimized for various storage needs.

| Directory  | Variable   | Purpose                | Flushed After | Quota             |
|------------|------------|------------------------|---------------|-------------------|
| `/archive` | `$ARCHIVE` | Long-term storage      | No            | 2TB / 20K inodes  |
| `/home`    | `$HOME`    | Configuration files    | No            | 50GB / 30K inodes |
| `/scratch` | `$SCRATCH` | Temporary data storage | Yes (60 days) | 5TB / 1M inodes   |


- **Check Your Quota:**

  ```bash
  myquota
  ```

- **Recommended:** Store the data you want to keep in `/scratch/yz5944` and temporary data in `/tmp`.


## Running Interactive Jobs

When you need to run scripts or perform debugging interactively, follow the steps below.

### Typical Workflow

1. Log in: Greene’s login node.
2. Log in to Burst node.
3. Request a job / computational resource and wait until Slurm grants it.
  - You always need to request a job for GPUs.
4. Execute singularity and start container instance.
5. Activate conda environment with your own deep learning libraries.
6. Run your code, make changes/debugging.

### Accounts and Partitions

- **Account:** `csci_ga_3033_077-2025sp`

- **Partitions:**
  - `interactive` (for lightweight tasks)
  - `n1s8-v100-1` (for GPU tasks)
  - `n1s16-v100-2`
  - `n2c48m24`
  - `g2-standard-12`
  - `g2-standard-24`
  - `c12m85-a100-1`
  - `c24m170-a100-2`

#### Understanding Partitions

- **Partitions** are specific resources or queues on the cluster.

---

### Simple Scripts and File Operations

For non-GPU tasks, use the `interactive` partition.

**Requesting an Interactive Session:**

```bash
srun --account=csci_ga_3033_077-2025sp --partition=interactive --pty /bin/bash
```

- **Options:**
  - `--account`: Specify account.
  - `--partition`: Choose partition.
  - `--pty /bin/bash`: Open an interactive shell.

> **Tip:** After allocation, verify the node:

```bash
hostname
```

---

### GPU Access

For GPU tasks, request resources from a GPU partition. Each student is assigned to Slurm account with 200 GPU hours (12000 minutes) and sufficient CPU time.

**Requesting a GPU Session:**

```bash
srun --account=csci_ga_3033_077-2025sp --partition=n1s8-v100-1 --gres=gpu:1 --time=1:00:00 --pty /bin/bash
```

- **Options:**
  - `--gres=gpu:1`: Request one GPU.
  - `--time=1:00:00`: Set time limit.

**Verify GPU Allocation:**

```bash
nvidia-smi
```
---

### Monitoring Jobs

Check the status of your jobs in the Slurm queue.

**Check Your Jobs:**

```bash
squeue -u yz5944
```

**Cancel a Job:**

- **Exit the Session:** Press `Ctrl+D` or type `exit`.

---

## Setting Up Singularity and Conda

### Copying the Filesystem Image

Copy the empty filesystem image (once per semester).

**Get on a GPU Node:**

```bash
srun --account=csci_ga_3033_077-2025sp --cpus-per-task=2 --mem=16GB --partition=n1s8-v100-1 --gres=gpu:v100:1 --time=04:00:00 --pty /bin/bash
```

**Navigate to Scratch Directory:**

```bash
cd /scratch/yz5944
```

**Download Overlay Filesystem:**

```bash
scp greene-dtn:/scratch/work/public/overlay-fs-ext3/overlay-25GB-500K.ext3.gz .
```

Filesystems can be mounted as read-write (`rw`) or read-only (`ro`) when we use it with singularity.
- read-write: use this one when setting up env (installing conda, libs, other static files)
- read-only: use this one when running your jobs. It has to be read-only since multiple processes will access the same image. It will crash if any job has already mounted it as read-write.

### Unzipping the Image

Unzip the ext3 filesystem (takes about 5 minutes).

```bash
gunzip -vvv ./overlay-25GB-500K.ext3.gz
```

```bash
# Copy the appropriate singularity image to the current working directory
scp -rp greene-dtn:/scratch/work/public/singularity/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif .
```


### Installing Conda

Install Conda inside the Singularity container.

**Start Singularity:**

```bash
singularity exec --bind /scratch --nv --overlay  /scratch/yz5944/overlay-25GB-500K.ext3:rw /scratch/yz5944/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif /bin/bash
```

**Inside Singularity:**

Download and install conda
```bash
cd /ext3/
wget --no-check-certificate https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
sh Miniforge3-Linux-x86_64.sh -b -p /ext3/miniforge3
```

Create wrapper script
```bash
touch /ext3/env.sh
echo '#!/bin/bash' >> /ext3/env.sh
echo 'unset -f which' >> /ext3/env.sh
echo 'source /ext3/miniforge3/etc/profile.d/conda.sh' >> /ext3/env.sh
echo 'export PATH=/ext3/miniforge3/bin:$PATH'         >> /ext3/env.sh
echo 'export PYTHONPATH=/ext3/miniforge3/bin:$PATH'   >> /ext3/env.sh
```
Activate conda environment

```bash
source /ext3/env.sh
```
Update Conda and Install Packages
```bash
conda config --remove channels defaults
conda update -n base conda -y
conda clean --all --yes
```


```bash
conda create -n bdml_env python==3.9
conda activate bdml_env
conda install pip --yes
conda install ipykernel --yes
conda install pytorch
```

### Testing the Setup

Test PyTorch and GPU access:

```python
python

>>> import torch
>>> torch.cuda.is_available()
True
>>> x = torch.tensor([1, 2])
>>> x
tensor([1, 2])
```

---


## Running Batch Jobs

For longer experiments or multiple jobs, use batch jobs.

### Batch Job Workflow

1. **Log In** to Greene.

2. **Submit an `sbatch` Script**.

### Submitting a Job Script

```bash
Request an interactive shell
```

**Write the Batch Script:**

```bash
#SBATCH --job-name=job_wgpu
#SBATCH --account=csci_ga_3033_077-2025sp
#SBATCH --partition=n1s8-v100-1
#SBATCH --open-mode=append
#SBATCH --output=./%j_%x.out
#SBATCH --error=./%j_%x.err
#SBATCH --export=ALL
#SBATCH --time=00:10:00
#SBATCH --gres=gpu:1
#SBATCH --requeue


singularity exec --bind /scratch --nv --overlay  /scratch/yz5944/overlay-25GB-500K.ext3:rw /scratch/yz5944/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif /bin/bash -c "
source /ext3/env.sh
conda activate bdml_env
cd /scratch/yz5944/bdml/
python ./test_script.py
"
```

**Submit Batch Job:**

```bash
sbatch gpu_job.slurm
```

**Check Job Status:**

```bash
squeue -u yz5944
```

### Checking Job Output

After job completion, check the output log.


### SCP Tutorial How to copy files around?

* Use Git and you don't need any of these :)

* From local to Greene, on local run

```bash
scp [optional flags] [file-path] yz5944@greene.hpc.nyu.edu:[greene-destination-path]
```

* From Greene to local, on local run

```bash
scp [optional flags] yz5944@greene.hpc.nyu.edu:[file-path] [local-destination-path]
```

* From Greene to GCP, on GCP run

```bash
scp [optional flags] greene-dtn:[file-path] [gcp-destination-path]
```

* From GCP to Greene, on GCP run

```bash
scp [optional flags] [file-path] greene-dtn:[greene-destination-path]
```

* From local to GCP: local → Greene → GCP
* From GCP to local: GCP → Greene → Local

### Running Jupyter Notebook

Create jupyter kernel
```Bash
mkdir -p ~/.local/share/jupyter/kernels
cd ~/.local/share/jupyter/kernels
scp -r greene-dtn:/share/apps/mypy/src/kernel_template ./my_env
cd ./my_env

ls
#kernel.json  logo-32x32.png  logo-64x64.png  python
```
In the 'python' file, change the singularity command at the bottom to
```Bash
singularity exec $nv \
  --overlay /scratch/yz5944/overlay-25GB-500K.ext3:ro \
  /scratch/yz5944/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif \
  /bin/bash -c "source /ext3/env.sh; conda activate bdml_env; $cmd $args"
```
Edit the default kernel.json file by setting PYTHON_LOCATION and KERNEL_DISPLAY_NAME.
```Bash
{
 "argv": [
  "/home/yz5944/.local/share/jupyter/kernels/my_env/python", #PYTHON_LOCATION
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "my_env", #KERNEL_DISPLAY_NAME
 "language": "python"
}

```
Go to https://ood-burst-001.hpc.nyu.edu/ to run Jupyter Notebook and VS Code.

Troubleshooting: https://sites.google.com/nyu.edu/nyu-hpc/training-support/general-hpc-topics/tunneling-and-x11-forwarding

Acknowledgement: Thanks to Divyam Madan for providing the base version for this tutorial.