# Note from Baleen authors

You can skip this and use the shared JupyterHub alone to run the simple tests and plotting, but if you would like to reproduce Baleen's results more extensively, you will need at least a dedicated server, if not a cluster.

Use this notebook to reserve and set up a dedicated server (otherwise, you will be using the shared JupyterHub with very limited storage and CPU constraints). This notebook is modified from the Jupyter artifact -- any modifications are labelled with "BALEEN-SPECIFIC" comments.

# Jupyter Notebook

A Jupyter notebook is what you're looking at right now. It contains a mix of Markdown and code, which is a great interface for annotating code and allowing it to be run interactively. This notebook is (probably) running on Chameleon's JupyterHub server. This means that the code is being executed on a very resource-light instance which is only designed to interface with Chameleon APIs. 

Because Jupyter notebooks are such a great way to present experiment code with results and documentation, it makes sense that one would want to execute their experimen's code from within a Jupyter notebook _directly_ on a powerful compute host. That is what this notebook accomplishes!

## Steps
1. Reserve a node for your experiments
2. Create an instance on that node
3. Spawn a Jupyter server on that instance
4. Connect to that Jupyter server

## Experiment configuration

We'll be running this server on a single node. The node will still be able to connect to other nodes on the same network if your experiment requires multiple nodes. Because this setup is so simple, you can configure the variables below to any valid configuration you want.

In [1]:
import os

project_name = "CHI-231080" # Change this if necessary
site_name = "CHI@TACC"
node_type = "compute_cascadelake_r"
image_name = "CC-Ubuntu22.04"
network_name = "sharednet1"

user = os.getenv("USER")
# Leases can be between 1 and 7 days
lease_length = 7
lease_name = f"{user}-jupyter-server"

With this configuration, we'll log into Chameleon so we can start provisioning our resources

In [2]:
import chi

chi.use_site(site_name)
chi.set("project_name", project_name)

Now using CHI@TACC:
URL: https://chi.tacc.chameleoncloud.org
Location: Austin, Texas, USA
Support contact: help@chameleoncloud.org


## Reserve a host

With our configuration, let's reserve a host to run our notebook server on.

In [None]:
import chi.lease

# Reserve a host for the Jupyter server
reservation = []
chi.lease.add_node_reservation(
    reservation,
    node_type=node_type,
    count=1
)
# We need to use a floating IP 
# in order to access the Jupyter server from the computer you're using
chi.lease.add_fip_reservation(reservation, count=1)

start_date, end_date = chi.lease.lease_duration(days=lease_length)

# Create the lease on Chameleon
print("Submitting lease...")
lease = chi.lease.create_lease(
    lease_name,
    reservation,
    start_date=start_date,
    end_date=end_date
)
print("Waiting for lease to become active...")
lease = chi.lease.wait_for_active(lease["id"])
print("Lease is active!")
lease

## Spawning an instance

With our resources in hand, we'll spawn an instance to run the Jupyter server on.

In order to connect to the server from the computer you're using right now, you'll need to set up an SSH keypair on Chameleon. If you haven't done this yet, please check out [the docs](https://chameleoncloud.readthedocs.io/en/latest/getting-started/index.html#accessing-your-instance).

In [None]:
import chi.network
import chi.server

network_id = chi.network.get_network_id(network_name)
server_name = f"{user}-jupyter-notebook-server"
node_reservation = chi.lease.get_node_reservation(
    lease["id"], 
    node_type=node_type,
    count=1,
)
print(f"Spawning server at {site_name}...")
notebook_server = chi.server.create_server(
    server_name,
    reservation_id=node_reservation,
    image_name=image_name,
    network_id=network_id,
    count=1,
)
print("Waiting for server to become active...")
chi.server.wait_for_active(notebook_server.id)
print(f"Server at {site_name} is active!")

We've created a server to run Jupyter on. In order to interact with the server from here on out, we'll need to connect via SSH over a floating IP address. So let's assign the floating IP we reserved and wait for SSH to be available.

In [None]:
floating_ip = chi.lease.get_reserved_floating_ips(lease["id"])[0]
chi.server.associate_floating_ip(notebook_server.id, floating_ip)
print("Associated floating IP with server.")
print("Waiting for SSH to become active...")
chi.server.wait_for_tcp(floating_ip, port=22)
print(f"Notebook server now accessible via SSH at {floating_ip}")

## Connecting to the server

Now that we can access the server, let's connect to it so that we can install Jupyter.

In [None]:
import chi.ssh

remote = chi.ssh.Remote(floating_ip)
remote.run("echo Hello from $(hostname)!")

## Setting up Jupyter

Now that the server is ready, we will install and configure Jupyter.ipynb_checkpoints/

In [None]:
# Install Jupyter package
if image_name.lower().startswith("cc-ubuntu"):
    remote.run("sudo apt update && sudo apt install -y jupyter-notebook python3-jupyterlab-server")
else:
    remote.run("python3 -m pip install --upgrade pip && python3 -m pip install jupyter jupyterlab")

In [None]:
# BALEEN-SPECIFIC
remote.run("git clone --recurse-submodules https://github.com/wonglkd/Baleen-FAST24.git")

In [None]:
# BALEEN-SPECIFIC
remote.run("python3 -m pip install --user -r Baleen-FAST24/BCacheSim/install/requirements.txt")

In [None]:
# Generate config
remote.run("which jupyter")

In [None]:
# Generate config
remote.run("jupyter notebook --generate-config")

### Creating a Jupyter service

In order to have Jupyter run in the background and not interrupt the rest of this notebook, we'll install it as a service rather than run it directly.

In [None]:
# Copy the systemd service manifest onto the server
remote.put("jupyter.service")
remote.run("sudo mv jupyter.service /etc/systemd/system")

In [None]:
# Start the service
remote.run("sudo systemctl daemon-reload")
remote.run("sudo systemctl enable jupyter.service")
remote.run("sudo systemctl start jupyter.service")

### Connecting to the server securely

**PLEASE READ, DO NOT IGNORE**

The server we've set up is in a **very insecure** configuration. This, however, is fine, because it is only accepting connections from the host it's running on. This means that random people on the internet can't connect to it and exploit it. In order to maintain this security, it's imperative that you adhere to the following rules:

1. **DO NOT, UNDER ANY CIRCUMSTANCES, MODIFY THE FIREWALL**
2. **DO NOT, UNDER ANY CIRCUMSTANCES, CHANGE THE PORT, IP, OR HOST THAT THE JUPYTER SERVER IS LISTENING ON**

If you do this, you may allow malicious actors to gain access to your Jupyter server, which will allow them to take complete control over your instance. They will absolutely ruin your experiment in order to mine Bitcoin and seed torrents. Your server will be found and exploited very quickly if you make bad changes to the configuration. If this notebook is having issues, and you're not sure you can fix it in a secure way, please submit a ticket to the Chameleon help desk.

However, if we block anyone from connecting to the server remotely, how will we use it?

Via an [SSH tunnel](https://www.ssh.com/academy/ssh/tunneling)! We will create a secure, encrypted tunnel to the Jupyter host, which will allow us to connect to the notebook server as if we are on the same host. This is the most secure way to remotely access services on Chameleon.

#### Creating an SSH tunnel

First upload a SSH key to this folder. You can generate one with the command: `ssh-keygen -t rsa -b 4096`

Then open a terminal **on your local machine, not the Jupyter interface**, and run the command output by the cell below:

In [None]:
print(f"ssh -NT -o ServerAliveInterval=60 -L 8888:localhost:8888 cc@{floating_ip} -i <path/to/sshkey>")

If the above command didn't work, it's probably because you did not upload your local machine's SSH key to Chameleon. If that's the case, upload `~/.ssh/id_rsa.pub` to the same folder as this notebook, and run the cell below.

In [None]:
# You may also use this command to open a terminal to the server
print(f"ssh cc@{floating_ip} -i <path/to/sshkey>")

In [None]:
import os

local_keyfile_path = "./id_rsa.pub"
if os.path.exists(local_keyfile_path):
    remote.put(local_keyfile_path, "/tmp/id_rsa.pub")
    remote.run("cat /tmp/id_rsa.pub >> ~/.ssh/authorized_keys")
    print("Loaded SSH key onto remote host")
else:
    print("No key uploaded. Skipping")

If you're able to run the `ssh` command from above without it exiting with an error, then you have successfully created an SSH tunnel! Now, you will be able to _securely_ access your Jupyter server at [http://localhost:8888](http://localhost:8888).

## Teardown

When we're done with the host we've loaded Jupyter on, we can free the resources we've reserved.

**Warning: This will permanently delete your instance and all the data on it. Only do this if you've ensured that your work has been backed up.**

We usually recommend experiment data be backed up to the [object store](https://chameleoncloud.readthedocs.io/en/latest/technical/swift.html).

In [16]:
do_teardown = False

if do_teardown:
    chi.lease.delete_lease(lease["id"])