## Python SDK

[Available on pypi](https://pypi.org/project/orchestracto-sdk/)

Some examples [are available on Github](https://github.com/tractoai/tracto-examples/tree/main/orchestracto)


## Creating workflows from local files

Run `WF_BASE_PATH=//home/some_map_node YT_PROXY=... YT_TOKEN=... orc sdk process ./orchestracto/example_yt/example.py` - it will create a workflow config with required docker images and upload them to Tracto. 
**Create required secret stores (in this case - `//home/some_map_node/secrets`) in advance**

## Creating workflows from Tracto notebooks

In [3]:
# configure environment to run this notebooks
import uuid
import yt.wrapper as yt

username = yt.get_user_name()
if yt.exists(f"//sys/users/{username}/@user_info/home_path"):
    # prepare working directory on distributed file system
    user_info = yt.get(f"//sys/users/{yt.get_user_name()}/@user_info")
    homedir = user_info["home_path"]
    # find avaliable vm presets
    cpu_pool_trees = [pool_tree for pool_tree in user_info["available_pool_trees"] if pool_tree.endswith("cpu")] or ["default"]
    h100_pool_trees = [pool_tree for pool_tree in user_info["available_pool_trees"] if pool_tree.endswith("h100")]
    h100_8_pool_trees = [pool_tree for pool_tree in user_info["available_pool_trees"] if pool_tree.endswith("h100-8")]
    workdir = f"{homedir}/tmp/demo_workdir/{uuid.uuid4().hex}"
else:
    cpu_pool_trees = ["default"]
    h100_pool_trees = ["gpu_h100"]
    h100_8_pool_trees = ["gpu_h100"]
    workdir = f"//tmp/examples/{uuid.uuid4().hex}"

yt.create("map_node", workdir, recursive=True, ignore_existing=True)
print("Current working directory:", workdir)

Current working directory: //tmp/examples/90c71122dee04006bba7a169c7d37b98


Install fresh orchestracto-sdk (if it is not installed in your kernel image)

In [5]:
!pip install -Uq orchestracto-sdk

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyt 0.0.0 requires attrs<23.3.0,>=23.2.0, but you have attrs 25.3.0 which is incompatible.
jupyt 0.0.0 requires pyzmq==26.3.0, but you have pyzmq 26.2.0 which is incompatible.
jupyt 0.0.0 requires rpds-py==0.24.0, but you have rpds-py 0.21.0 which is incompatible.[0m[31m
[0m


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Define workflow tasks:

In [7]:
import os
import random
import uuid
from typing import Any, Iterable

import yt.wrapper as yt

from orc_sdk import workflow, task


@task(retval_names=["table_path"])
def generate_table(dir_path: str, num_rows) -> str:
    table_path = f"{dir_path}/table_{uuid.uuid4()}"
    yt_cli = yt.YtClient(config=yt.default_config.get_config_from_env())
    yt_cli.create("table", table_path)
    yt_cli.write_table(
        table_path,
        [{"id": str(uuid.uuid4()), "value": random.randint(0, 1000)} for i in range(num_rows)]
    )
    return table_path


@task(retval_names=["table_path"])
def add_random_value(table_path: str) -> str:
    def mapper(row: dict[str, Any]) -> Iterable[dict[str, Any]]:
        row["value"] = row["value"] + random.randint(0, 100)
        yield row

    yt_cli = yt.YtClient(config=yt.default_config.get_config_from_env())
    yt_cli.run_map(mapper, table_path, table_path)

    return table_path


@task(retval_names=["table_path"])
def filter_values(table_path: str, threshold: int) -> str:
    def mapper(row: dict[str, Any]) -> Iterable[dict[str, Any]]:
        if row["value"] > threshold:
            yield row

    yt_cli = yt.YtClient(config=yt.default_config.get_config_from_env())
    yt_cli.run_map(mapper, table_path, table_path)

    return table_path


@task()
def merge_tables(table_1_path: str, table_2_path: str):
    yt_cli = yt.YtClient(config=yt.default_config.get_config_from_env())
    yt_cli.run_merge([table_1_path, table_2_path], f"{table_1_path}_merged")

Define workflow:

In [9]:
@workflow(
    f"{workdir}/the_workflow_pickling",
)
def the_workflow(wfro, num_rows: int = 42):
    gen_table_1_step = generate_table(workdir, num_rows).with_memory_limit(512 * 1024 * 1024)
    wfro.register_first_step(gen_table_1_step)

    table_1_rand_val_step = add_random_value(gen_table_1_step.outputs.table_path)

    gen_table_2_step = generate_table(workdir, num_rows)
    wfro.register_first_step(gen_table_2_step)

    table_2_rand_val_step = filter_values(gen_table_2_step.outputs.table_path, 500)

    merge_tables(
        table_1_rand_val_step.outputs.table_path,
        table_2_rand_val_step.outputs.table_path
    ).with_additional_requirements(["requests==2.25.1"])

And run `process_workflow_object` - it will create the workflow in Tracto

In [11]:
from orc_sdk import process_workflow_object

process_workflow_object(the_workflow)

Preparing workflow config


Building and pushing images


Build 1 of 2 is done


Build 2 of 2 is done
Execution time [build_and_push_docker_images]: 103.849 seconds
Images are built and pushed
Execution time [prepare_workflow_config]: 104.575 seconds


Execution time [update_workflow_config_on_yt]: 0.459 seconds
Workflow is updated
