# Run a single user notebook server on Chameleon

This notebook describes how to run a single user Jupyter notebook server on Chameleon. This allows you to run experiments requiring bare metal access, storage, memory, GPU and compute  resources on Chameleon using a Jupyter notebook interface.

## Provision the resource


### Check resource availability

This notebook will try to reserve a bare metal Ubuntu on CHI@UC - pending availability. Before you begin, you should check the host calendar at [https://chi.uc.chameleoncloud.org/project/leases/calendar/host/](https://chi.uc.chameleoncloud.org/project/leases/calendar/host/) to see what node types are available.

### Chameleon configuration

You can change your Chameleon project name (if not using the one that is automatically configured in the JupyterHub environment) and the site on which to reserve resources (depending on availability) in the following cell.

In [None]:
import chi, os

PROJECT_NAME = os.getenv('OS_PROJECT_NAME')
chi.use_site("CHI@UC")
chi.set("project_name", PROJECT_NAME)

If you need to change the details of the Chameleon server, e.g. use a different OS image, or a different node type depending on availability, you can do that in the following cell.

For our sequence of notebooks on "On Warm-starting neural networks training", we will use a single rtx 6000 GPU with Ubuntu 20.04. 

In [None]:
chi.set("image", "CC-Ubuntu20.04")
# note: we use base Ubuntu because we want a newer CUDA than is in Chameleon's Ubuntu+CUDA image
NODE_TYPE = "gpu_rtx_6000"

### Reservation

The following cell will create a reservation. You can modify the start and end date as needed.

In [None]:
from chi import lease
from datetime import datetime, timedelta

res = []
lease.add_node_reservation(res, node_type=NODE_TYPE, count=1)
lease.add_fip_reservation(res, count=1)

# Set the start date ( choose in timedelta when to start from this moment )
start_date = datetime.now() + timedelta(days=0, hours=16)

# Set the end date ( choose in timedelta the reservation time )
end_date = start_date + timedelta(hours=24)

# Format the start date using %Y-%m-%d %H:%M
start_date = start_date.strftime('%Y-%m-%d %H:%M')

# Format the end date using %Y-%m-%d %H:%M
end_date = end_date.strftime('%Y-%m-%d %H:%M')

print(f"start date: {start_date}, end date: {end_date}")

Now we create the lease with the specified specifications

In [None]:
# Create the lease ( Don't run this cell again if you already run it before )
l = lease.create_lease(f"{os.getenv('USER')}-{NODE_TYPE}-warm-starting", res, start_date=start_date, end_date=end_date)

# Print the lease id so you can check from here later ( You can chameleon to find your lease id too )
print(f"Lease ID: {l['id']}")

Run the next cell when your lease time starts

In [None]:
from chi import lease
# Get lease again in case the runtime restarted ( you can also get it by replacing the name with id from above )
l = lease.get_lease(f"{os.getenv('USER')}-{NODE_TYPE}-warm-starting")

# Wait for lease to get active
l = lease.wait_for_active(l["id"])

### Provisioning resources

This cell provisions resources. It will take approximately 10 minutes. You can check on its status in the Chameleon web-based UI: [https://chi.uc.chameleoncloud.org/project/instances/](https://chi.uc.chameleoncloud.org/project/instances/), then come back here when it is in the READY state.

In [None]:
from chi import server

# Create a server
reservation_id = lease.get_node_reservation(l["id"])
server.create_server(
    f"{os.getenv('USER')}-{NODE_TYPE}-warm-starting", 
    reservation_id=reservation_id,
    image_name=chi.get("image")
)

In [None]:
# Wait for the server to activate
server_id = server.get_server_id(f"{os.getenv('USER')}-{NODE_TYPE}-warm-starting")
server.wait_for_active(server_id)

Associate an IP address with this server:

In [None]:
reserved_fip = lease.get_reserved_floating_ips(l["id"])[0]
server.associate_floating_ip(server_id,reserved_fip)

and wait for it to come up:

In [None]:
server.wait_for_tcp(reserved_fip, port=22)

## Install stuff

The following cells will install some basic packages in order to connect your Colab frontend to your Chameleon server. However, you may want to log in to your Chameleon server in order to access its terminal and install or configure packages outside of Colab.

To log in to the resource, use File > New > Terminal in the Chameleon JupyterHub environment, or your local terminal, and run:


In [None]:
print("cc@" + reserved_fip)

Meanwhile, install an updated CUDA, Python and JupyterHub on your resource:

In [None]:
from chi import ssh

node = ssh.Remote(reserved_fip)

Install python3 and update base modules

In [None]:
node.run('sudo apt update')
node.run('sudo apt -y install python3-pip python3-dev')
node.run('sudo pip3 install --upgrade pip')

Install required CUDA to use the GPU

In [None]:
node.run('wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb')
node.run('sudo dpkg -i cuda-keyring_1.0-1_all.deb')
node.run('sudo apt update')
node.run('sudo apt -y install linux-headers-$(uname -r)')
node.run('sudo apt-mark hold cuda-toolkit-12-config-common nvidia-driver-535') # don't let it install this cuda
node.run('sudo apt -y install nvidia-driver-520') # this driver likes CUDA 11.8

In [None]:
try:
    node.run('sudo reboot') # reboot and wait for it to come up
except:
    pass
server.wait_for_tcp(reserved_fip, port=22)
node = ssh.Remote(reserved_fip)

In [None]:
node.run('sudo apt -y install cuda-11-8 cuda-runtime-11-8 cuda-drivers=520.61.05-1')
node.run('sudo apt -y install nvidia-gds-11-8') # install instructions say to do this separately!
node.run('sudo apt -y install libcudnn8=8.9.3.28-1+cuda11.8 nvidia-cuda-toolkit') # make sure the get cuda-11-8 version

In [None]:
node.run("echo 'PATH=\"/usr/local/cuda-11.8/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin\"' | sudo tee /etc/environment")

Now we have to reboot, and make sure we have the latest CUDA:

In [None]:
try:
    node.run('sudo reboot')
except:
    pass
server.wait_for_tcp(reserved_fip, port=22)
node = ssh.Remote(reserved_fip) # note: need a new SSH session to get new PATH
node.run('nvidia-smi')
node.run('nvcc --version')

### Install Python packages

We can install the required packages to run our experiment by running the next cell

In [None]:
node.run('python3 -m pip install --user Cython==0.29.32')
node.run('wget https://raw.githubusercontent.com/teaching-on-testbeds/re_warm_start_nn/main/chameleon_requirements.txt -O chameleon_requirements.txt')
node.run('python3 -m pip install --user -r chameleon_requirements.txt --extra-index-url https://download.pytorch.org/whl/cu113')

Test that everything went well by running the following cell

In [None]:
node.run('python3 -c \'import torch; print(torch.cuda.get_device_name(0))\'')
# should say: Quadro RTX 6000

## Now we can connect to a colab server or run a JupyterHub server and use it

## (Option 1) Set up Jupyter on server to Run on JupyterHub server

Install Jupyter:

In [None]:
node.run('python3 -m pip install --user  jupyter-core jupyter-client jupyter -U --force-reinstall')

### Retrieve the materials

Finally, get a copy of the notebooks that you will run:

In [None]:
node.run('git clone https://github.com/teaching-on-testbeds/re_warm_start_nn')

### Run a JupyterHub server

Run the following cell:

In [None]:
print('ssh -L 127.0.0.1:8888:127.0.0.1:8888 cc@' + reserved_fip) 

then paste its output into a *local* terminal on your own device, to set up a tunnel to the Jupyter server. If your Chameleon key is not in the default location, you should also specify the path to your key as an argument, using `-i`. Leave this SSH session open.

Then, run the following cell, which will start a command that does not terminate: 

In [None]:
node.run("/home/cc/.local/bin/jupyter notebook --port=8888 --notebook-dir='/home/cc/re_warm_start_nn/notebooks'")

In the output of the cell above, look for a URL in this format:
    
```
http://localhost:8888/?token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
```

Copy this URL and open it in a browser. Then, you can run the sequence of notebooks that you'll see there, in order.

If you need to stop and re-start your Jupyter server, 

- Use Kernel > Interrupt Kernel *twice* to stop the cell above
- Then run the following cell to kill whatever may be left running in the background.

In [None]:
node.run("sudo killall jupyter-notebook")

## (Option 2) Connect Colab to the server

Install `jupyter_http_over_ws`, which is required in order to connect Colab to this Jupyter instance:

In [None]:
node.run('python3 -m pip install --user  jupyter-core jupyter-client jupyter_http_over_ws traitlets -U --force-reinstall')

And, active `jupyter_http_over_ws`:

In [None]:
node.run('/home/cc/.local/bin/jupyter serverextension enable --py jupyter_http_over_ws')

In a **local terminal on your own laptop**, run

In [None]:
print('ssh -L 127.0.0.1:8888:127.0.0.1:8888 cc@' + reserved_fip) 

to set up a tunnel to the Jupyter server. If your Chameleon key is not in the default location, you should also specify the path to your key as an argument, using `-i`. Leave this SSH session open.

Then, run the following cell, which will run a command that does not terminate: 

In [None]:
node.run("/home/cc/.local/bin/jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' --port=8888 --NotebookApp.port_retries=0")

In the output of the cell above, look for a URL in this format:

http://localhost:8888/?token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Copy this URL - you will need it in the next step.

Now, you can open Colab in a browser. Click on the drop-down menu for "Connect" in the top right and select "Connect to a local runtime". Paste the URL you copied earlier into the space and click "Connect". Your notebook should now be running on your Colab host (you can put `!hostname` in a cell and run it to verify!)

## Release resources

If you finish with your experimentation before your lease expires,release your resources and tear down your environment by running the following (commented out to prevent accidental deletions).

This section is designed to work as a "standalone" portion - you can come back to this notebook, ignore the top part, and just run this section to delete your reasources.

In [None]:
# setup environment - if you made any changes in the top part, make the same changes here
import chi, os
from chi import lease, server

PROJECT_NAME = os.getenv('OS_PROJECT_NAME')
chi.use_site("CHI@UC")
chi.set("project_name", PROJECT_NAME)
NODE_TYPE = "gpu_rtx_6000"

lease = chi.lease.get_lease(f"{os.getenv('USER')}-{NODE_TYPE}-warm-starting")

In [None]:
DELETE = False
# DELETE = True 

if DELETE:
    # delete server
    server_id = chi.server.get_server_id(f"{os.getenv('USER')}-{NODE_TYPE}-warm-starting")
    chi.server.delete_server(server_id)

    # release floating IP
    reserved_fip =  chi.lease.get_reserved_floating_ips(lease["id"])[0]
    ip_info = chi.network.get_floating_ip(reserved_fip)
    chi.neutron().delete_floatingip(ip_info["id"])

    # delete lease
    chi.lease.delete_lease(lease["id"])
