# SC23 MLEC Artifact

---

## Create experiment container

This container provides the following:

- One node of any types
- One public IP

### Configuration

Here we use the project "SC23 Reproducibility" with ID "CHI-221071". You might need to enter your own project ID if you in some other Chameleon project.
We will reserve a zen3 compute node from CHI@TACC site.

In [1]:
import chi

chi.use_site("CHI@TACC")
chi.set("project_name", "CHI-221071")

print(f'Using Project {chi.get("project_name")}')

Now using CHI@TACC:
URL: https://chi.tacc.chameleoncloud.org
Location: Austin, Texas, USA
Support contact: help@chameleoncloud.org
Using Project CHI-221071


In [2]:
import os

USER = os.getenv('USER')

### Create reservation

Chameleon resources need to be reserved before they can be used. 
We will reserve one bare metal node and one public IP address, for right now.

For MLEC simulation, we need a compute node with **ideally >200 cores**. Here we reserve the **zen3 compute node**.

If you get an error such as "no host available", it may be the case that all the requested nodes are reserved. Check the availiablility calendar to see if this is true:
https://chi.tacc.chameleoncloud.org/project/leases/calendar/host/

It may take around a minute or so for your lease to become active.

In [3]:
import time
import keystoneauth1, blazarclient
from chi import lease

reservations = []
reservation_req_time = int(time.time())
LEASE_KEY = f"{USER}-sc23-mlec-{reservation_req_time}"

try:
    print(f"Creating lease with name = {LEASE_KEY}...")
    lease.add_fip_reservation(reservations, count=1)
    lease.add_node_reservation(reservations,
                               resource_properties=["=", "$node_type", "compute_zen3"],  # reserve zen3 compute node
                               count=1)

    start_date, end_date = lease.lease_duration(hours=0, days=5) # reserve the node for 5 days

    l = lease.create_lease(
        LEASE_KEY, 
        reservations, 
        start_date=start_date, 
        end_date=end_date
    )
    lease_id = l["id"]

    print("Waiting for lease to start ...")
    lease.wait_for_active(lease_id)
    print("Lease is now active!")
except blazarclient.exception.BlazarClientException as e:
    print(f"There is an issue making the reservation. Check the calendar to make sure a {lease_node_type} node is available.")
    print("https://chi.tacc.chameleoncloud.org/project/leases/calendar/host/")
    print(e)
except Exception as e:
    print("An unexpected error happened.")
    print(e)

Creating lease with name = wangm12-sc23-mlec-1687986082...
Waiting for lease to start ...
Lease is now active!


### Provision bare metal node

Next, we will launch the reserved node with the image "CC-Ubuntu20.04".
It will take approximately 10 minutes for the bare metal node to be successfully provisioned. 

During this 10 minutes, the requested image is configured, downloaded and copied onto the hard drive, and the node is configured to reboot to the new OS. 

In [5]:
from chi import server

image = "CC-Ubuntu20.04"

s = server.create_server(
    LEASE_KEY, 
    image_name=image,
    reservation_id=lease.get_node_reservation(lease_id)
)

print("Waiting for server to start ... This could take ~10 minutes.")
print("Waiting...")
server.wait_for_active(s.id)
print("Done")

Waiting for server to start ... This could take ~10 minutes.
Waiting...
Done


By default the node is only connected to a private network and thus not reachable over the internet or via Jupyter here. We need to associate a "Floating IP" to the node, which gives it the public address we reserved.

In [6]:
floating_ip = lease.get_reserved_floating_ips(lease_id)[0]
server.associate_floating_ip(s.id, floating_ip_address=floating_ip)

print(f"Waiting for SSH connectivity on {floating_ip} ...")
timeout = 60 * 2
import socket
import time
# Repeatedly try to connect via SSH.
start_time = time.perf_counter()
while True:
    try:
        with socket.create_connection((floating_ip, 22), timeout=timeout):
            print("Connection successful")
            break
    except OSError as ex:
        time.sleep(10)
        if time.perf_counter() - start_time >= timeout:
            print(f"Timeout: after {timeout} seconds, could not connect via SSH. please wait until SSH is up and ready")

Waiting for SSH connectivity on 129.114.109.238 ...
Timeout: after 120 seconds, could not connect via SSH. please wait until SSH is up and ready
Timeout: after 120 seconds, could not connect via SSH. please wait until SSH is up and ready
Timeout: after 120 seconds, could not connect via SSH. please wait until SSH is up and ready
Timeout: after 120 seconds, could not connect via SSH. please wait until SSH is up and ready
Timeout: after 120 seconds, could not connect via SSH. please wait until SSH is up and ready
Timeout: after 120 seconds, could not connect via SSH. please wait until SSH is up and ready
Timeout: after 120 seconds, could not connect via SSH. please wait until SSH is up and ready
Timeout: after 120 seconds, could not connect via SSH. please wait until SSH is up and ready
Timeout: after 120 seconds, could not connect via SSH. please wait until SSH is up and ready
Timeout: after 120 seconds, could not connect via SSH. please wait until SSH is up and ready
Timeout: after 120

<br />

## Saving Session

<div class="alert alert-block alert-info">Note: This is needed to pass variables between notebook files such as <b>floating_ip</b> and <b>reservation_id</b></div>

In [7]:
import scripts.session as session

session.clear()
session.save({ "floating_ip": floating_ip, "reservation_id": lease_id, "server_id": s.id })
session.load()

{'floating_ip': '129.114.109.238',
 'reservation_id': '8792b8b3-f55e-48fd-ab2f-8d36d6d58ee1',
 'server_id': '1238e209-9331-4e20-b556-3969158441e5'}

---

## Experiment Preparation

### Setup node

First, before running the experiments, we need to download the packages listed in `setup.sh`.

This will take around 10 minutes to complete.

<div class="alert alert-block alert-info">Note: all warning in this cell can be safely ignored</div>

In [2]:
from chi import ssh

In [6]:
from chi import ssh
import scripts.session as session
import scripts.ssh_helper as ssh_helper

session_data = session.load()
floating_ip = session_data["floating_ip"]
print("Got floating_ip: {}".format(floating_ip))

Got floating_ip: 129.114.109.238


In [9]:
import scripts.ssh_helper as ssh_helper

with ssh.Remote(floating_ip) as conn:
    conn.run('rm -rf sc23-mlec')
    conn.run('mkdir -p sc23-mlec')
    ssh_helper.put_dir(conn, ".", "sc23-mlec")
    print("We've copied this directory to the remote server. \"ls\" gives: ")
    conn.run("ls")

We've copied this directory to the remote server. "ls" gives: 
README
ReedSolomonEC
anaconda3
mlec-sim
my_mounting_point
openrc
sc23-mlec


In [13]:
with ssh.Remote(floating_ip) as conn:
    print("Now we set up and install the simulator")
    conn.run("cd sc23-mlec/scripts && bash setup-node.sh")

Now we set up and install the simulator
1. Installing Conda

Anaconda has already installed.
2. Using conda to install "numpy matplotlib mpmath pandas"

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done




  current version: 23.3.1
  latest version: 23.5.0

Please update conda by running

    $ conda update -n base -c defaults conda

Or to minimize the number of packages updated during conda update use

     conda install conda=23.5.0





# All requested packages already installed.

3. Downloading mlec-sim from github


Successfully setup mlec-sim!


fatal: destination path '/home/cc/mlec-sim' already exists and is not an empty directory.


4. Downloading isa-l to evaluate encoding throughput


fatal: destination path '/home/cc/ReedSolomonEC' already exists and is not an empty directory.




Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:2 http://security.ubuntu.com/ubuntu focal-security InRelease
Hit:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Reading package lists...
Building dependency tree...
Reading state information...
60 packages can be upgraded. Run 'apt list --upgradable' to see them.






Reading package lists...
Building dependency tree...
Reading state information...
autoconf is already the newest version (2.69-11.1).
g++ is already the newest version (4:9.3.0-1ubuntu2).
gcc is already the newest version (4:9.3.0-1ubuntu2).
libtool is already the newest version (2.4.6-14).
make is already the newest version (4.2.1-1.2).
nasm is already the newest version (2.14.02-1).
0 upgraded, 0 newly installed, 0 to remove and 60 not upgraded.
Reading package lists...
Building dependency tree...
Reading state information...
nasm is already the newest version (2.14.02-1).
yasm is already the newest version (1.3.0-2ubuntu1).
0 upgraded, 0 newly installed, 0 to remove and 60 not upgraded.
Configuring isa-l... 


configure.ac:23: installing 'build-aux/compile'
configure.ac:12: installing 'build-aux/missing'
Makefile.am: installing 'build-aux/depcomp'


Compiling EC performance evaluation scripts...
Successfully setup isa-l!
