# Job Page Loading Helper

This notebook provides tooling to populate a jobs page from scratch using images from the [wandb dockerhub](https://hub.docker.com/u/wandb).

Tooling includes:
1. Create job from docker image
2. Rename job to user-friendly name
3. Add example runs that will auto-populate the run's `Clone from...` menu
4. Delete the dummy run used to create the job initially

Notes:
1. This notebook uses a [special branch of the SDK](https://github.com/wandb/wandb/tree/andrew/helpers) with helpful GQL mutations added.  Please install that branch for now until it's merged into main.
2. Jobs prefixed with `gpu_` require a GPU to run and are added to a GPU queue by default.  Please make sure you have a GPU  agent available to run these jobs, otherwise no runs will be populated.
3. You must have queues running to populate jobs!
4. The `sql_query` job currently does not work on M1.  This is due to upstream issues with emulation and lack of linux/arm64 support for the `connectorx` package.  The job should still work on an `amd64` machine.

## Settings

In [1]:
# Repo settings
JOB_REPO_ENTITY = 'launch-test'
JOB_REPO_PROJECT = 'jobs'

# Queue settings
CPU_QUEUE_NAME = 'andrew-cpu'
GPU_QUEUE_NAME = 'andrew-gpu'

# Job/image settings
DOCKER_IMAGE_TAG = '134fcaf3d4b1499e69b426fad803b7e2cca85ab5'
JOBS_DIR = 'jobs'

In [2]:
def get_env(envlist):
    env = {}
    with open(envlist) as f:
        for line in f.read().splitlines():
            k, v = line.split('=')
            env[k] = v
    
    return env

job_repo_base_env = get_env("/Users/andrewtruong/.wandb_launch/env.list")

In [3]:
from functools import partial
from pathlib import Path

import platform
import click
import docker
import yaml

import wandb
from wandb.sdk.internal.internal_api import Api as InternalApi
from wandb.sdk.launch import launch_add


api = wandb.Api()
iapi = InternalApi()
LOADER_STR = "__loader-delete-me__"


def load_job(jobname, queue_name, entity=JOB_REPO_ENTITY, project=JOB_REPO_PROJECT, tag=DOCKER_IMAGE_TAG):
    img = jobname2img(jobname, tag)
    wandb.termlog(f"Creating job: {entity}/{project}/{img}")
    create_job(img, entity, project)
    
    registry = get_registry()
    ui_name, ui_desc = registry[jobname]['name'], registry[jobname]['desc']
    artname = jobname2artname(jobname, tag)
    artpath = artname2artpath(artname, entity, project, tag='latest')
    wandb.termlog(f"Renaming job to: {ui_name}")
    rename_job(artpath, ui_name, ui_desc)
    
    new_artpath = artname2artpath(ui_name, entity, project, tag='latest')
    wandb.termlog("Adding new example runs...")
    add_example_runs(new_artpath, jobname, entity, project, queue_name)


def create_job(img, entity=JOB_REPO_ENTITY, project=JOB_REPO_PROJECT, env=job_repo_base_env):
    """
    Create a job by running the docker image.
    The run will show as failed because there is no config, but that's ok.  It will get deleted later.
    """
    env["WANDB_ENTITY"] = entity
    env["WANDB_PROJECT"] = project
    env["WANDB_NAME"] = LOADER_STR
    env["WANDB_DOCKER"] = img

    client = docker.from_env()
    
    emulation = True
    if emulation:
        container = client.containers.run(img, environment=env, detach=True, auto_remove=True, network_mode='host')
    else:
        container = client.containers.run(img, environment=env, detach=True, auto_remove=True, network_mode='host', platform='linux/amd64')    
    
    output = container.attach(stdout=True, stream=True, logs=True)
    for line in output:
        click.echo(line.decode('utf-8'), nl=False)

                    
                    
def rename_job(job_path, new_name, new_desc):
    """
    Rename the job from the default name to a pretty name and description we define in `registry.yaml`
    """
    art = api.artifact(job_path)
    asid = art._attrs['artifactSequence']['id']
    
    iapi.update_artifact_collection(asid, new_name, new_desc)


def add_example_runs(job_art_path, jobname, entity, project, queue_name):
    """
    Add example runs for the user to see and easily `Clone from...` in the UI.
    """
    base_launcher = partial(launch_add.launch_add, job=job_art_path, project=project, entity=entity, queue_name=queue_name)
    config_paths = Path(f'{JOBS_DIR}/{jobname}/configs').glob('*.yml')
    
    for p in config_paths:
        with p.open() as f:
            config = yaml.safe_load(f)
        base_launcher(config={"overrides": {"run_config": config['config']}}, name=config['run_name'])
        

def delete_loader_runs():
    """
    Delete the unsightly "loader" run
    """
    api = wandb.Api()
    for run in api.runs(f"{JOB_REPO_ENTITY}/{JOB_REPO_PROJECT}"):
        if run.name == LOADER_STR:
            run.delete()
    

def get_registry():
    with open('registry.yaml') as f:
        return yaml.safe_load(f)

def jobname2img(jobname, tag):
    return f"wandb/job_{jobname}:{tag}"

def get_jobnames(jobs_dir):
    return [p.stem for p in Path(jobs_dir).glob('*')]

def jobname2artname(jobname, tag):
    return f"job-wandb_job_{jobname}_{tag}"

def artname2artpath(artname, entity, project, tag="latest"):
    return f'{entity}/{project}/{artname}:{tag}'

## Spin up helper resources

In [4]:
# !docker run -p 3307:3306 -d sakiladb/mysql:latest
# !docker build -t tritonserver-wandb jobs/deploy_to_nvidia_triton/server && \
#     docker run --rm --net=host -p 8000:8000 -v $HOME/.aws:/root/.aws:ro -d tritonserver-wandb

## Deploy jobs

In [5]:
jobnames = get_jobnames(JOBS_DIR)
is_m1 = platform.machine() == 'arm64' and platform.system() == "Darwin"

for jobname in jobnames:
    if is_m1 and jobname == 'sql_query':
        continue  # connectorx seems to cause issues with emulation on M1.
    if jobname.startswith('gpu_'):
        load_job(jobname, GPU_QUEUE_NAME)
    else:
        load_job(jobname, CPU_QUEUE_NAME)

delete_loader_runs()

[34m[1mwandb[0m: Creating job: launch-test/jobs/wandb/job_openai_evals:134fcaf3d4b1499e69b426fad803b7e2cca85ab5


wandb: Thanks for trying out the Report API!
wandb: For a tutorial, check out https://colab.research.google.com/drive/1CzyJx1nuOS4pdkXa2XPaRQyZdmFmLmXV
wandb: 
wandb: Try out tab completion to see what's available.
wandb:   ∟ everything:    `wr.<tab>`
wandb:       ∟ panels:    `wr.panels.<tab>`
wandb:       ∟ blocks:    `wr.blocks.<tab>`
wandb:       ∟ helpers:   `wr.helpers.<tab>`
wandb:       ∟ templates: `wr.templates.<tab>`
wandb:       
wandb: For bugs/feature requests, please create an issue on github: https://github.com/wandb/wandb/issues
wandb: Currently logged in as: megatruong (launch-test). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.0
wandb: Run data is saved locally in /launch/wandb/run-20230428_140001-wn8h49pq
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run __loader-delete-me__
wandb: ⭐️ View project at https://wandb.ai/launch-test/jobs
wandb: 🚀 View run at https://wandb.ai/launch-test/jobs/runs/wn8h49pq
wand

[34m[1mwandb[0m: Renaming job to: OpenAI Evals
[34m[1mwandb[0m: Adding new example runs...
[34m[1mwandb[0m: [35mlaunch:[0m 🚀 Launching run into launch-test/jobs
[34m[1mwandb[0m: [35mlaunch:[0m Added run to queue andrew-cpu.
[34m[1mwandb[0m: [35mlaunch:[0m Launch spec:
[34m[1mwandb[0m: {'docker': {},
[34m[1mwandb[0m:  'entity': 'launch-test',
[34m[1mwandb[0m:  'git': {},
[34m[1mwandb[0m:  'job': 'launch-test/jobs/OpenAI Evals:latest',
[34m[1mwandb[0m:  'name': 'Emotional Intelligence - 01',
[34m[1mwandb[0m:  'overrides': {'run_config': {'eval': 'emotional-intelligence',
[34m[1mwandb[0m:                               'model': {'name': 'gpt-3.5-turbo',
[34m[1mwandb[0m:                                         'override_prompt': 'You are an '
[34m[1mwandb[0m:                                                            'emotionally '
[34m[1mwandb[0m:                                                            'intelligent AI. '
[34m[1mwandb

wandb: Currently logged in as: megatruong (launch-test). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.0
wandb: Run data is saved locally in /workspace/wandb/run-20230428_140016-l1s1aywa
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run __loader-delete-me__
wandb: ⭐️ View project at https://wandb.ai/launch-test/jobs
wandb: 🚀 View run at https://wandb.ai/launch-test/jobs/runs/l1s1aywa
wandb: downloading model
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: 🚀 View run __loader-delete-me__ at: https://wandb.ai/launch-test/jobs/runs/l1s1aywa
wandb: Synced 5 W&B file(s), 0 media file(s), 2 artifact file(s) and 1 other file(s)
wandb: Find logs at: ./wandb/run-20230428_140016-l1s1aywa/logs
Traceback (most recent call last):
  File "job.py", line 27, in <module>
    model_dir = run.config["model"].download()
  File "/usr/local/lib/python3.8/dist-packages/wandb/sdk/wandb_config.py", line

[34m[1mwandb[0m: Renaming job to: Optimize with NVIDIA TensorRT
[34m[1mwandb[0m: Adding new example runs...
[34m[1mwandb[0m: [35mlaunch:[0m 🚀 Launching run into launch-test/jobs
[34m[1mwandb[0m: [35mlaunch:[0m Added run to queue andrew-gpu.
[34m[1mwandb[0m: [35mlaunch:[0m Launch spec:
[34m[1mwandb[0m: {'docker': {},
[34m[1mwandb[0m:  'entity': 'launch-test',
[34m[1mwandb[0m:  'git': {},
[34m[1mwandb[0m:  'job': 'launch-test/jobs/Optimize with NVIDIA TensorRT:latest',
[34m[1mwandb[0m:  'name': 'TensorFlow',
[34m[1mwandb[0m:  'overrides': {'run_config': {'benchmark': {'benchmarking_rounds': 1000,
[34m[1mwandb[0m:                                             'input_shape': [32, 299, 299, 3],
[34m[1mwandb[0m:                                             'warmup_rounds': 50},
[34m[1mwandb[0m:                               'model': 'wandb-artifact://megatruong/trt-testing/inceptionv3:latest',
[34m[1mwandb[0m:                               'pre

wandb: Currently logged in as: megatruong (launch-test). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.0
wandb: Run data is saved locally in /launch/wandb/run-20230428_140026-uovsuji6
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run __loader-delete-me__
wandb: ⭐️ View project at https://wandb.ai/launch-test/jobs
wandb: 🚀 View run at https://wandb.ai/launch-test/jobs/runs/uovsuji6
wandb: Waiting for W&B process to finish... (success).
wandb: 
wandb: Run summary:
wandb: hello world
wandb: 
wandb: 🚀 View run __loader-delete-me__ at: https://wandb.ai/launch-test/jobs/runs/uovsuji6
wandb: Synced 4 W&B file(s), 0 media file(s), 2 artifact file(s) and 1 other file(s)
wandb: Find logs at: ./wandb/run-20230428_140026-uovsuji6/logs


[34m[1mwandb[0m: Renaming job to: Hello World
[34m[1mwandb[0m: Adding new example runs...
[34m[1mwandb[0m: [35mlaunch:[0m 🚀 Launching run into launch-test/jobs
[34m[1mwandb[0m: [35mlaunch:[0m Added run to queue andrew-cpu.
[34m[1mwandb[0m: [35mlaunch:[0m Launch spec:
[34m[1mwandb[0m: {'docker': {},
[34m[1mwandb[0m:  'entity': 'launch-test',
[34m[1mwandb[0m:  'git': {},
[34m[1mwandb[0m:  'job': 'launch-test/jobs/Hello World:latest',
[34m[1mwandb[0m:  'name': 'Hello World Example',
[34m[1mwandb[0m:  'overrides': {'run_config': {}},
[34m[1mwandb[0m:  'project': 'jobs',
[34m[1mwandb[0m:  'resource': 'local-container',
[34m[1mwandb[0m:  'resource_args': {'local-container': {'env-file': '/Users/andrewtruong/.wandb_launch/env.list',
[34m[1mwandb[0m:                                        'net': 'host',
[34m[1mwandb[0m:                                        'volume': ['/Users/andrewtruong/.aws:/home/andrewtruong/.aws:ro',
[34m[1mwandb[0

wandb: Currently logged in as: megatruong (launch-test). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.0
wandb: Run data is saved locally in /launch/wandb/run-20230428_140044-80629u0m
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run __loader-delete-me__
wandb: ⭐️ View project at https://wandb.ai/launch-test/jobs
wandb: 🚀 View run at https://wandb.ai/launch-test/jobs/runs/80629u0m
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: 🚀 View run __loader-delete-me__ at: https://wandb.ai/launch-test/jobs/runs/80629u0m
wandb: Synced 4 W&B file(s), 0 media file(s), 2 artifact file(s) and 1 other file(s)
wandb: Find logs at: ./wandb/run-20230428_140044-80629u0m/logs
Traceback (most recent call last):
  File "/launch/job.py", line 47, in <module>
    if run.config["framework"] not in supported_frameworks:
  File "/usr/local/lib/python3.9/site-packages/wandb/sdk/wandb_config.py", line 130, i

[34m[1mwandb[0m: Renaming job to: Deploy to Sagemaker Endpoints
[34m[1mwandb[0m: Adding new example runs...
[34m[1mwandb[0m: [35mlaunch:[0m 🚀 Launching run into launch-test/jobs
[34m[1mwandb[0m: [35mlaunch:[0m Added run to queue andrew-cpu.
[34m[1mwandb[0m: [35mlaunch:[0m Launch spec:
[34m[1mwandb[0m: {'docker': {},
[34m[1mwandb[0m:  'entity': 'launch-test',
[34m[1mwandb[0m:  'git': {},
[34m[1mwandb[0m:  'job': 'launch-test/jobs/Deploy to Sagemaker Endpoints:latest',
[34m[1mwandb[0m:  'name': 'Deploy PyTorch Model',
[34m[1mwandb[0m:  'overrides': {'run_config': {'artifact': 'wandb-artifact://megatruong/ptl-testing2/model-vgw632i7:v0',
[34m[1mwandb[0m:                               'framework': 'pytorch',
[34m[1mwandb[0m:                               'framework_version': '1.12',
[34m[1mwandb[0m:                               'instance_count': 1,
[34m[1mwandb[0m:                               'instance_type': 'ml.c5.xlarge',
[34m[1mwa

wandb: Currently logged in as: megatruong (launch-test). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.0
wandb: Run data is saved locally in /launch/wandb/run-20230428_140053-e5svo2zj
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run __loader-delete-me__
wandb: ⭐️ View project at https://wandb.ai/launch-test/jobs
wandb: 🚀 View run at https://wandb.ai/launch-test/jobs/runs/e5svo2zj
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: 🚀 View run __loader-delete-me__ at: https://wandb.ai/launch-test/jobs/runs/e5svo2zj
wandb: Synced 4 W&B file(s), 0 media file(s), 2 artifact file(s) and 1 other file(s)
wandb: Find logs at: ./wandb/run-20230428_140053-e5svo2zj/logs
Traceback (most recent call last):
  File "/launch/job.py", line 12, in <module>
    msg = pymsteams.connectorcard(run.config["webhook_url"])
  File "/usr/local/lib/python3.9/site-packages/wandb/sdk/wandb_config.py", line 130, 

[34m[1mwandb[0m: Renaming job to: Microsoft Teams Webhook
[34m[1mwandb[0m: Adding new example runs...
[34m[1mwandb[0m: [35mlaunch:[0m 🚀 Launching run into launch-test/jobs
[34m[1mwandb[0m: [35mlaunch:[0m Added run to queue andrew-cpu.
[34m[1mwandb[0m: [35mlaunch:[0m Launch spec:
[34m[1mwandb[0m: {'docker': {},
[34m[1mwandb[0m:  'entity': 'launch-test',
[34m[1mwandb[0m:  'git': {},
[34m[1mwandb[0m:  'job': 'launch-test/jobs/Microsoft Teams Webhook:latest',
[34m[1mwandb[0m:  'name': 'Example',
[34m[1mwandb[0m:  'overrides': {'run_config': {'alias': '${alias}',
[34m[1mwandb[0m:                               'artifact': 'wandb-artifact://wandb/pytorch-lightning-e2e/nature-e1d5dg6m:latest',
[34m[1mwandb[0m:                               'color': '#00FF00',
[34m[1mwandb[0m:                               'link_button': {'text': 'Review Deployed Model',
[34m[1mwandb[0m:                                               'url': 'https://www.wandb.ai

wandb: Currently logged in as: megatruong (launch-test). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.0
wandb: Run data is saved locally in /launch/wandb/run-20230428_140102-1c5nvg85
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run __loader-delete-me__
wandb: ⭐️ View project at https://wandb.ai/launch-test/jobs
wandb: 🚀 View run at https://wandb.ai/launch-test/jobs/runs/1c5nvg85
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: 🚀 View run __loader-delete-me__ at: https://wandb.ai/launch-test/jobs/runs/1c5nvg85
wandb: Synced 4 W&B file(s), 0 media file(s), 2 artifact file(s) and 1 other file(s)
wandb: Find logs at: ./wandb/run-20230428_140102-1c5nvg85/logs
Traceback (most recent call last):
  File "/launch/job.py", line 24, in <module>
    token = os.getenv(run.config['github_api_token_env_var'])
  File "/usr/local/lib/python3.9/site-packages/wandb/sdk/wandb_config.py", line 130,

[34m[1mwandb[0m: Renaming job to: Send Webhook
[34m[1mwandb[0m: Adding new example runs...
[34m[1mwandb[0m: [35mlaunch:[0m 🚀 Launching run into launch-test/jobs
[34m[1mwandb[0m: [35mlaunch:[0m Added run to queue andrew-cpu.
[34m[1mwandb[0m: [35mlaunch:[0m Launch spec:
[34m[1mwandb[0m: {'docker': {},
[34m[1mwandb[0m:  'entity': 'launch-test',
[34m[1mwandb[0m:  'git': {},
[34m[1mwandb[0m:  'job': 'launch-test/jobs/Send Webhook:latest',
[34m[1mwandb[0m:  'name': 'Example',
[34m[1mwandb[0m:  'overrides': {'run_config': {'github_api_token_env_var': 'GITHUB_API_TOKEN',
[34m[1mwandb[0m:                               'payload_inputs': {'template-file': 'workflow_helpers/template.py'},
[34m[1mwandb[0m:                               'ref': 'main',
[34m[1mwandb[0m:                               'repo': 'wandb/launch-jobs',
[34m[1mwandb[0m:                               'retry_settings': {'attempts': 3,
[34m[1mwandb[0m:                         

wandb: Currently logged in as: megatruong (launch-test). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.0
wandb: Run data is saved locally in /launch/wandb/run-20230428_140112-vdo01emq
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run __loader-delete-me__
wandb: ⭐️ View project at https://wandb.ai/launch-test/jobs
wandb: 🚀 View run at https://wandb.ai/launch-test/jobs/runs/vdo01emq
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: 🚀 View run __loader-delete-me__ at: https://wandb.ai/launch-test/jobs/runs/vdo01emq
wandb: Synced 4 W&B file(s), 0 media file(s), 2 artifact file(s) and 1 other file(s)
wandb: Find logs at: ./wandb/run-20230428_140112-vdo01emq/logs
Traceback (most recent call last):
  File "/launch/job.py", line 23, in <module>
    token = os.getenv(run.config["github_api_token_env_var"])
  File "/usr/local/lib/python3.9/site-packages/wandb/sdk/wandb_config.py", line 130,

[34m[1mwandb[0m: Renaming job to: Github Actions Workflow Dispatch
[34m[1mwandb[0m: Adding new example runs...
[34m[1mwandb[0m: [35mlaunch:[0m 🚀 Launching run into launch-test/jobs
[34m[1mwandb[0m: [35mlaunch:[0m Added run to queue andrew-cpu.
[34m[1mwandb[0m: [35mlaunch:[0m Launch spec:
[34m[1mwandb[0m: {'docker': {},
[34m[1mwandb[0m:  'entity': 'launch-test',
[34m[1mwandb[0m:  'git': {},
[34m[1mwandb[0m:  'job': 'launch-test/jobs/Github Actions Workflow Dispatch:latest',
[34m[1mwandb[0m:  'name': 'Generate Report Action',
[34m[1mwandb[0m:  'overrides': {'run_config': {'github_api_token_env_var': 'GITHUB_API_TOKEN',
[34m[1mwandb[0m:                               'owner': 'wandb',
[34m[1mwandb[0m:                               'ref': 'main',
[34m[1mwandb[0m:                               'repo': 'launch-jobs',
[34m[1mwandb[0m:                               'retry_settings': {'attempts': 3,
[34m[1mwandb[0m:                           

wandb: Currently logged in as: megatruong (launch-test). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.0
wandb: Run data is saved locally in /launch/wandb/run-20230428_140130-amzh0ase
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run __loader-delete-me__
wandb: ⭐️ View project at https://wandb.ai/launch-test/jobs
wandb: 🚀 View run at https://wandb.ai/launch-test/jobs/runs/amzh0ase
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: Network error (TransientError), entering retry loop.
wandb: 🚀 View run __loader-delete-me__ at: https://wandb.ai/launch-test/jobs/runs/amzh0ase
wandb: Synced 4 W&B file(s), 0 media file(s), 2 artifact file(s) and 1 other file(s)
wandb: Find logs at: ./wandb/run-20230428_140130-amzh0ase/logs
Traceback (most recent call last):
  File "/launch/job.py", line 79, in <module>
    model_name, model_ver = run.config["artifact"].name.split(":v")
  File "/usr/local

[34m[1mwandb[0m: Renaming job to: Deploy to NVIDIA Triton Inference Server
[34m[1mwandb[0m: Adding new example runs...
[34m[1mwandb[0m: [35mlaunch:[0m 🚀 Launching run into launch-test/jobs
[34m[1mwandb[0m: [35mlaunch:[0m Added run to queue andrew-cpu.
[34m[1mwandb[0m: [35mlaunch:[0m Launch spec:
[34m[1mwandb[0m: {'docker': {},
[34m[1mwandb[0m:  'entity': 'launch-test',
[34m[1mwandb[0m:  'git': {},
[34m[1mwandb[0m:  'job': 'launch-test/jobs/Deploy to NVIDIA Triton Inference Server:latest',
[34m[1mwandb[0m:  'name': 'Deploy PyTorch Model',
[34m[1mwandb[0m:  'overrides': {'run_config': {'artifact': 'wandb-artifact://megatruong/ptl-testing2/my_model:v0',
[34m[1mwandb[0m:                               'framework': 'pytorch',
[34m[1mwandb[0m:                               'triton_bucket': 'andrew-triton-bucket',
[34m[1mwandb[0m:                               'triton_model_config_overrides': {'input': [{'data_type': 'TYPE_FP32',
[34m[1mwandb[

## Delete sagemaker endpoints that were spun up
- You may have to run this manually because the jobs above need to actually run before the endpoints are created

In [6]:
import boto3

sagemaker = boto3.client('sagemaker')

response = sagemaker.list_endpoints()
endpoints = response['Endpoints']

for endpoint in endpoints:
    try:
        sagemaker.delete_endpoint(EndpointName=endpoint['EndpointName'])
    except Exception as e:
        print(e)

## Check to see if any setup runs failed

In [7]:
api = wandb.Api()  # you need to run this again to refresh the runs
for run in api.runs(f"{JOB_REPO_ENTITY}/{JOB_REPO_PROJECT}"):
    if run.state == 'failed':
        for art in run.used_artifacts():
            if art.type == 'job':
                job = art.name
                break
        print(f"{job}::{run.name} || {run}")
        
        

Deploy to NVIDIA Triton Inference Server:v0::Deploy TensorFlow Model || <Run megatruong/jobs/7qmjh103 (failed)>
Deploy to NVIDIA Triton Inference Server:v0::Deploy PyTorch Model || <Run megatruong/jobs/uo07ijwp (failed)>
Github Actions Workflow Dispatch:v0::Generate Report Action || <Run megatruong/jobs/ntic18z3 (failed)>
job-wandb_job_optimize_with_tensor_rt_98f24741ebd810806def11fc2499236274147f50:v0::TensorFlow || <Run megatruong/jobs/bfufwq23 (failed)>
Send Webhook:v0::Example || <Run megatruong/jobs/bdfeudxn (failed)>
SQL Query (table):v0::Table || <Run megatruong/jobs/wzpzwyuk (failed)>
SQL Query (artifact):v0::Artifact || <Run megatruong/jobs/h7bntbjl (failed)>
OpenAI Evals:v0::American Bar Association - 03 || <Run megatruong/jobs/vo54r7mj (failed)>
job-wandb_job_deploy_to_nvidia_triton_98f24741ebd810806def11fc2499236274147f50:v0::good-violet-143 || <Run megatruong/jobs/020ch8b4 (failed)>
job-wandb_job_github_actions_workflow_dispatch_98f24741ebd810806def11fc2499236274147f50:v0: