<a href="https://colab.research.google.com/github/JacobDowns/CSCI-491-591/blob/main/jetstream2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://docs.jetstream-cloud.org/images/JS2-Logo-Transparent.png" alt="Jetstream2 Logo">

# JetStream2
* Anvil is an example of a fairly traditional HPC cluster that uses Slurm for job management
* The On Demand interface is a nice addition for interacting with Anvil via GUI, but for the most part it is a good template for interacting with many supercomputers


* Jetstream2 uses a cloud computing environment based on [OpenStack](https://www.openstack.org/)
* In some sense it operates similarly to Amazon Web Services or Google Cloud Platform where you can spin up compute instances to accomplish certain workloads

## Logging In
* To login you can use the web interface [Exosphere](https://docs.jetstream-cloud.org/getting-started/login/)
* Use your normal ACCESS credentials

## Instances
* To get started on Jetstream2 you first need to create an instance
* An instance is like a virtual computer in the cloud
* There are many instance types in Jetstream2 that have different computational [resources](https://docs.jetstream-cloud.org/general/instance-flavors/)
* Unlike Anvil, which uses Slurm for job scheduling, once an instance is created on jetstream2 you use it more like you would your own computer
* That is, once an instance is created, the resoruces are reserved for your use
* Check out the user guide on how to set up an [instance](https://docs.jetstream-cloud.org/getting-started/first-instance/#navigating-the-user-interface)
* Instances can install one a few differnt operatings systems like Ubuntu or Rocky Linux

## Jetstream2 Instance Flavors  

This document describes the available instance “flavors” (i.e., VM size options) on Jetstream2 and what each is suitable for. Each flavor incurs a cost in Service Units (SUs) according to how many vCPUs (cores) and how much RAM the instance uses.

---

### 1. Basics  
- On Jetstream2, **1 SU = 1 vCPU-hour**
- JS2 is divided into **three resource types** (each a separate ACCESS resource):  
  1. CPU (general compute)  
  2. Large Memory  
  3. GPU  
  Having access to one doesn't automatically guarantee access to the others.
- Root disk sizes are fixed per flavor, but you may use “volume-backed” options for larger root disks.
- When selecting a flavor, consider the number of cores, RAM, local storage size, and the SU cost per hour.

---

### 2. CPU Flavors  
| Flavor         | vCPUs | RAM (GB) | Local Storage (GB) | Cost per hour (SUs) |
|----------------|-------|----------|--------------------|----------------------|
| `m3.tiny`      | 1     |   3      | 20                 | 1                    |
| `m3.small`     | 2     |   6      | 20                 | 2                    |
| `m3.quad`      | 4     |  15      | 20                 | 4                    |
| `m3.medium`    | 8     |  30      | 60                 | 8                    |
| `m3.large`     | 16    |  60      | 60                 | 16                   |
| `m3.xl`        | 32    | 125      | 60                 | 32                   |
| `m3.2xl`       | 64    | 250      | 60                 | 64                   |
| `m3.3xl*`      | 128   | 500      | 60                 | 128                  |


**Use case:** Great for general CPU jobs, teaching labs, prototyping scripts, Python/NumPy work, moderate parallelism.

---

### 3. Large Memory Flavors  
Large-memory instances have roughly **double the memory** of equivalently-resourced CPU flavors and cost **2 SUs per vCPU-hour**.
| Flavor        | vCPUs | RAM (GB) | Local Storage (GB) | Cost per hour (SUs) |
|---------------|-------|----------|---------------------|----------------------|
| `r3.large*`   | 64    | 500      | 60                  | 128                  |
| `r3.xl*`      | 128   | 1000     | 60                  | 256                  |

* These flavors likewise require justification / are limited capacity.  

**Use case:** Memory-intensive workloads: large in-memory datasets (e.g., big NumPy arrays, in-memory databases), data-science, graph-analytics.

---

### 4. GPU Flavors  
These allow you to use NVIDIA GPUs; appropriate if you'll use CUDA, PyTorch, TensorFlow with GPU, or other accelerated computing. There are two sub-categories: Partial GPU (vGPU) and Full GPU (PCI-passthrough).
#### 4.1 Partial GPU  
| Flavor       | vCPUs | RAM (GB) | Local Storage (GB) | GPU Compute | GPU RAM (GB) | Cost per hour (SUs) |
|--------------|-------|----------|---------------------|-------------|-------------|---------------------|
| `g3.medium`  | 8     | 30       | 60                  | ~25 % GPU   | 10          | 16                  |
| `g3.large*`  | 16    | 60       | 60                  | ~50 % GPU   | 20          | 32                  |

*Note: “~25% GPU” means a portion of an A100 vGPU. The actual available GPU performance may exceed the “minimum” allotment if the rest of the physical GPU is idle.

#### 4.2 Full GPU  
| Flavor         | vCPUs | RAM (GB) | Local Storage (GB) | GPU Type     | GPU RAM (GB) | Cost per hour (SUs) |
|----------------|-------|----------|---------------------|--------------|--------------|----------------------|
| `g3.xl*`       | 32    | 120      | 60                  | A100         | 40           | 64                   |
| `g3.2xl*`      | 64    | 240      | 60                  | A100         | 80           | 128                  |
| `g3.4xl*`      | 128   | 480      | 60                  | A100         | 160          | 256                  |
| `g4.xl*`       | 12    | 120      | 60                  | L40S         | 48           | 84                   |
| `g4.2xl*`      | 24    | 240      | 60                  | L40S         | 96           | 168                  |
| `g4.4xl*`      | 48    | 480      | 60                  | L40S         | 192          | 336                  |
| `g5.xl*`       | 20    | 240      | 60                  | H100         | 80           | 128                  |
| `g5.2xl*`      | 40    | 480      | 60                  | H100         | 160          | 256                  |
| `g5.4xl*`      | 80    | 960      | 60                  | H100         | 320          | 512                  |

* Many of these GPU flavors (especially L40S, H100, multi-GPU A100) are **not available by default** and require justification
* We have access to one full GPU instance `g3.xl`, but for testing and debugging stick with `g3.medium` and `g3.large`, then scale up as appropriat4e

**Use case:** Machine learning, deep learning, GPU-accelerated compute, benchmarking, inference, student labs using PyTorch/TF.



## Using an Instance
* To use an instance, you can open a shell to the instance on Exosphere
* You run commands just like you would on your own terminal or the Anvil shell
* Jetstream2 has modules similar to Anvil
* However, you also have sudo access to the VM on your instance, so you can install whatever software you [desire](https://docs.jetstream-cloud.org/getting-started/software/)
* Web desktop is also available.
* See the following for [accessing instances](https://docs.jetstream-cloud.org/getting-started/access-instance/)

## SSH Access
* You can also setup access to an instance using SSH, similarly to Anvil
* To do this you'll need to do the following when creating a new instance:
  1. Click to show advanced options and **create a public ip address**. This will allow you to ssh into the instance using this static ip
  2. Jestream2 support [SSH public key authentication](https://docs.jetstream-cloud.org/ui/exo/access-instance/#accessing-an-instance-with-native-ssh)
  3. As an alternative, it also supports pass phrase SSH authentication, and instances are automatically assigned passphrases when created

* If you setup public key authentication, then SSH'ing into the instance will look something like this:
```
ssh -i /path/to/key/private_key_file exouser@<PUBLIC_IP>
```
* For example
```
ssh -i ~/.ssh/id_rsa exouser@149.165.0.0
```

## Managing Instances
* Perhaps the most important thing to know about instances is that they burn through SUs whether you are actively using them or not
* Different instances have different burn rates as mentioned above
* Hence, when your not working with an instance you should [shelve it](https://docs.jetstream-cloud.org/general/instancemgt/)
* When you unshelve the instance, you can resume working with it again tand the instances disk contents are preserved but any running programs will exit
> **Remember to shelve your instances!**
* You can also image an instance, which allows you to share it with other users
* Hence, a software environment can be shared





## HPC Class Image
* We have a prebuilt class image that has `numba-cuda`, `cupy`, and `pytorch` installed in a Docker container
* To access this, when creating an image you can select an instance source by image and search for `hpc_um_gpu_instance`
* To use GPUs, you'll want to use a GPU supporting instance `g3.medium`, `g3.large`, or `g3.xl`
* The class HPC image has a few tests for `numba`, `cupy`, and `pytorch` in the tests directory
* To do the Numba test, for example you can use
```
docker run --rm -it --gpus all -v "$PWD":/workspace -w /workspace my/hpc-class python numba_test.py
```
* The included docker image `my/hpc-class` has many dependencies for GPU programming already installed

## Storage
* Jetstream2 has a few different mechanisms for storage
* The main one is a **volume**
* Volumes of different sizes can be created and mounted on an instance
* Volumes are persistent, so they will remain even if an instance mounting it is destroyed



| Mechanism | Description | Access method | Multi-attach | Best for | Advantages | Limitations |
|------------|--------------|----------------|---------------|-----------|--------------|--------------|
| **Volume (Cinder)** | Block storage attached to a single instance | Attach to one VM, mount as a filesystem | No | Per-VM data, databases, custom root disks | Simple, persistent, supports snapshots | Single attach only, requires a running instance |
| **Manila Share** | Shared file storage (typically NFS) | Mount via NFS on multiple VMs | Yes | Shared data for classes or teams | Multi-user access, resizable, access control | Must be mounted to use, not ideal for metadata-heavy workloads |
| **Object Store (S3/Swift)** | S3-compatible object storage service | Access via HTTPS or S3 API | Yes (API-based) | Web-accessible datasets, archives, public data | Accessible from anywhere, highly scalable, no VM required | Not POSIX filesystem, slower random I/O, needs API tools |

### When to Use
- Use a **Volume** for per-instance, high-performance, local data.
- Use a **Manila Share** for shared access within your project or class.
- Use an **Object Store** for distributing large or public datasets.

Reference: [Jetstream2 Storage Documentation](https://docs.jetstream-cloud.org/general/storage/#when-to-use-manila-shares)
