Uploading existing zeno project to a zeno backend using the Zeno API.

We assume that this existing project has been created with an older version of zeno.
Therefore, this script makes assumptions about the column names and data structure of
the CSV.

In [53]:
USERNAME = "test"
PASSWORD = "Test12345!"
ENDPOINT = "http://localhost:8000"
PROJECT_NAME = "accent-project"
PROJECT_VIEW = "audio-transcription"
DATA_URL = "https://zenoml.s3.amazonaws.com/accents/"
EXISTING_PROJECT_PATH = "accents.csv"

In [46]:
%load_ext autoreload
%autoreload 2

%env PUBLIC_BACKEND_ENDPOINT=http://localhost:8000

from zeno_api import ZenoClient
import pandas as pd


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
env: PUBLIC_BACKEND_ENDPOINT=http://localhost:8000


In [47]:
zeno_client = ZenoClient(
    username=USERNAME, password=PASSWORD, endpoint=ENDPOINT
)

In [48]:
project = zeno_client.create_project(
    PROJECT_NAME, 
    view=PROJECT_VIEW
    data_url=DATA_URL,
)

Successfully created project test/accent-project


In [49]:

data_frame = pd.read_csv(EXISTING_PROJECT_PATH)
data_frame = data_frame.fillna("")
output_cols = list(
    filter(lambda col: str(col).startswith("OUTPUToutput"), data_frame.columns)
)
models = list(
    set(list(map(lambda col: str(col).replace("OUTPUToutput", ""), output_cols)))
)
predistill_cols = list(
    filter(lambda col: str(col).startswith("PREDISTILL"), data_frame.columns)
)
postdistill_cols = list(
    filter(lambda col: str(col).startswith("POSTDISTILL"), data_frame.columns)
)
embedding_cols = list(
    filter(lambda col: str(col).startswith("EMBEDDING"), data_frame.columns)
)

In [50]:
df_dataset = data_frame.drop(output_cols + predistill_cols + postdistill_cols + embedding_cols, axis=1)

In [51]:
df_dataset.head()

Unnamed: 0,id,age,age_onset,birthplace,native_language,sex,speakerid,country,id.1,label,continent,data
0,afrikaans1.wav,27.0,9.0,"virginia, south africa",afrikaans,female,1,south africa,afrikaans1.wav,Please call Stella. Ask her to bring these th...,Africa,afrikaans1.wav
1,afrikaans2.wav,40.0,5.0,"pretoria, south africa",afrikaans,male,2,south africa,afrikaans2.wav,Please call Stella. Ask her to bring these th...,Africa,afrikaans2.wav
2,afrikaans3.wav,43.0,4.0,"pretoria, transvaal, south africa",afrikaans,male,418,south africa,afrikaans3.wav,Please call Stella. Ask her to bring these th...,Africa,afrikaans3.wav
3,afrikaans4.wav,26.0,8.0,"pretoria, south africa",afrikaans,male,1159,south africa,afrikaans4.wav,Please call Stella. Ask her to bring these th...,Africa,afrikaans4.wav
4,afrikaans5.wav,19.0,6.0,"cape town, south africa",afrikaans,male,1432,south africa,afrikaans5.wav,Please call Stella. Ask her to bring these th...,Africa,afrikaans5.wav


In [52]:
project.upload_dataset(df_dataset, "id", label_column="label", data_column="id.1")

Successfully uploaded data
