# Uploading Legacy Zeno Project

Uploading existing zeno project to a zeno backend using the Zeno client.

We assume that this existing project has been created with an older version of zeno.
Therefore, this script makes assumptions about the column names and data structure of
the CSV.

In this example we are uploading the audio transcription example.

In [2]:
from zeno_client import ZenoClient, ZenoMetric
import pandas as pd
import os

In [9]:
API_KEY = "zen_ZQQGdoqKJULxaIG9coQfwe5ECGIq8DTn4tv7cQLxw0U" 
PROJECT_NAME = "Audio Transcription Accents"
PROJECT_VIEW = "audio-transcription"
DATA_URL = "https://zenoml.s3.amazonaws.com/accents/"
EXISTING_PROJECT_PATH = "accents.csv"

In [4]:

data_frame = pd.read_csv(EXISTING_PROJECT_PATH)
data_frame = data_frame.fillna("")
output_cols = list(
    filter(lambda col: str(col).startswith("OUTPUToutput"), data_frame.columns)
)
models = list(
    set(list(map(lambda col: str(col).replace("OUTPUToutput", ""), output_cols)))
)
predistill_cols = list(
    filter(lambda col: str(col).startswith("PREDISTILL"), data_frame.columns)
)
postdistill_cols = list(
    filter(lambda col: str(col).startswith("POSTDISTILL"), data_frame.columns)
)
embedding_cols = list(
    filter(lambda col: str(col).startswith("EMBEDDING"), data_frame.columns)
)

In [5]:
data_frame.head()

Unnamed: 0,id,age,age_onset,birthplace,native_language,sex,speakerid,country,id.1,label,...,POSTDISTILLspecial_charswhisper-tiny.en,POSTDISTILLwer_mwhisper-tiny.en,OUTPUToutputwhisper-base,EMBEDDINGembeddingwhisper-base,POSTDISTILLspecial_charswhisper-base,POSTDISTILLwer_mwhisper-base,OUTPUToutputwhisper-base.en,EMBEDDINGembeddingwhisper-base.en,POSTDISTILLspecial_charswhisper-base.en,POSTDISTILLwer_mwhisper-base.en
0,afrikaans1.wav,27.0,9.0,"virginia, south africa",afrikaans,female,1,south africa,afrikaans1.wav,Please call Stella. Ask her to bring these th...,...,8,0.275362,"Please call Stella, ask her to bring these th...",,7,0.202899,"Please call Stella, ask her to bring these th...",,6,0.130435
1,afrikaans2.wav,40.0,5.0,"pretoria, south africa",afrikaans,male,2,south africa,afrikaans2.wav,Please call Stella. Ask her to bring these th...,...,7,0.057971,"Please call Stella, ask her to bring these th...",,7,0.057971,"Please call Stella, ask her to bring these th...",,7,0.057971
2,afrikaans3.wav,43.0,4.0,"pretoria, transvaal, south africa",afrikaans,male,418,south africa,afrikaans3.wav,Please call Stella. Ask her to bring these th...,...,9,0.043478,Please call Stella. Ask her to bring these th...,,6,0.043478,"Please call Stella, ask her to bring these th...",,6,0.101449
3,afrikaans4.wav,26.0,8.0,"pretoria, south africa",afrikaans,male,1159,south africa,afrikaans4.wav,Please call Stella. Ask her to bring these th...,...,6,0.086957,"Please call Stella, ask her to bring these th...",,7,0.072464,Please call Stella. Ask her to bring these th...,,6,0.086957
4,afrikaans5.wav,19.0,6.0,"cape town, south africa",afrikaans,male,1432,south africa,afrikaans5.wav,Please call Stella. Ask her to bring these th...,...,7,0.144928,Please call Stella. Ask her to bring these th...,,6,0.101449,Please call Stella. Ask her to bring these th...,,6,0.144928


In [13]:
df_dataset = data_frame.drop(output_cols + predistill_cols + postdistill_cols + embedding_cols, axis=1)
df_dataset["continent"] = df_dataset.apply(lambda row: "North America" if row["country"] == "usa" else row["continent"], axis=1)

In [14]:
df_dataset.head()

Unnamed: 0,id,age,age_onset,birthplace,native_language,sex,speakerid,country,id.1,label,continent,data
0,afrikaans1.wav,27.0,9.0,"virginia, south africa",afrikaans,female,1,south africa,afrikaans1.wav,Please call Stella. Ask her to bring these th...,Africa,afrikaans1.wav
1,afrikaans2.wav,40.0,5.0,"pretoria, south africa",afrikaans,male,2,south africa,afrikaans2.wav,Please call Stella. Ask her to bring these th...,Africa,afrikaans2.wav
2,afrikaans3.wav,43.0,4.0,"pretoria, transvaal, south africa",afrikaans,male,418,south africa,afrikaans3.wav,Please call Stella. Ask her to bring these th...,Africa,afrikaans3.wav
3,afrikaans4.wav,26.0,8.0,"pretoria, south africa",afrikaans,male,1159,south africa,afrikaans4.wav,Please call Stella. Ask her to bring these th...,Africa,afrikaans4.wav
4,afrikaans5.wav,19.0,6.0,"cape town, south africa",afrikaans,male,1432,south africa,afrikaans5.wav,Please call Stella. Ask her to bring these th...,Africa,afrikaans5.wav


In [10]:
zeno_client = ZenoClient(API_KEY)
project = zeno_client.create_project(
    PROJECT_NAME, 
    view=PROJECT_VIEW,
    data_url=DATA_URL,
    metrics=[ZenoMetric(name="avg_wer", type="mean", columns=["wer"])]
)

Successfully updated project  62ec4e74-7358-4801-b80c-d19e51ff2a4f


In [15]:
project.upload_dataset(df_dataset, "id", label_column="label", data_column="id.1")

Successfully uploaded data


In [16]:
for i, model in enumerate(models):
    df_to_upload = data_frame[["id", f"OUTPUToutput{model}", f"POSTDISTILLwer_m{model}"]].rename(columns={f"OUTPUToutput{model}": "output", f"POSTDISTILLwer_m{model}": "wer"})
    project.upload_system(
        model,
        df_to_upload,
        output_column="output",
        id_column="id",
    )

Successfully uploaded system
Successfully uploaded system
Successfully uploaded system
Successfully uploaded system
Successfully uploaded system
