##### This notebook demonstrates a demo of how you can deploy your first job with TrueFoundry.

---


After you complete the notebook, you will have a successful deployed a job to train a model on the iris dataser. Your jobs deployment dashboard will look like this:

![](https://files.readme.io/2f6871c-Screenshot_2022-11-16_at_11.43.31_PM.png)


## Project structure

To complete this guide, you are going to create the following **files**:

- `train.py` : contains our training code
- `requirements.txt` : contains our dependencies
- `deploy.py` contains our deployment code ( you can also use a deployment configuration for deploying using a YAML file)

Your **final file structure** is going to look like this:

```
.
├── train.py
├── deploy.py
└── requirements.txt
```

As you can see, all the following files are created in the same folder/directory


# Setup

Let's first setup all the things we need to deploy our service.

- Signup or Login on TrueFoundry
- Setup Workspace


Let's start with installing `truefoundry`.


In [6]:
%pip install -U truefoundry boto3 neo4j weaviate-client

Collecting weaviate-client
  Downloading weaviate_client-4.15.3-py3-none-any.whl.metadata (3.7 kB)
Downloading weaviate_client-4.15.3-py3-none-any.whl (433 kB)
Installing collected packages: weaviate-client
  Attempting uninstall: weaviate-client
    Found existing installation: weaviate-client 4.15.2
    Uninstalling weaviate-client-4.15.2:
      Successfully uninstalled weaviate-client-4.15.2
Successfully installed weaviate-client-4.15.3
Note: you may need to restart the kernel to use updated packages.


**Login into TrueFoundry**

In [7]:
!tfy login --host "https://platform-admin.live-demo.truefoundry.cloud"

Already logged in to [32m'https://platform-admin.live-demo.truefoundry.cloud'[0m as [32m'tfy-user'[0m
To relogin, use `tfy login --host [4;94mhttps://platform-admin.live-demo.truefoundry.cloud[0m --relogin` or `[1;35mtruefoundry.login[0m[1m([0m[33mhost[0m=[32m'https://platform-admin.live-demo.truefoundry.cloud'[0m, [33mrelogin[0m=[3;92mTrue[0m[1m)[0m` function


**Select the `Workspace` in which you want to deploy your application.**

Once you run the cell below you will get a prompt to enter your Workspace FQN. Follow the docs to

**Create a Workspace**: https://docs.truefoundry.com/docs/key-concepts#creating-a-workspace

**Get Existing Workspace FQN**: https://docs.truefoundry.com/docs/key-concepts#get-workspace-fqn

In [3]:
WORKSPACE_FQN = "tfy-aws:dev-ws"

## Load Data
We will show multiple ways to load the dataset here:
1. Load data from External S3 bucket
2. Load data from neo4j
3. Load data from weaviate
4. Load data from TrueFoundry ML Repository

# 1. Load data from External S3 bucket

External S3 bucket can be accessed using IAM role attached to serviceaccount of the Notebook

In [4]:
# !pip install boto3
import boto3
s3_client = boto3.client("s3")

response = s3_client.list_objects(Bucket='tfy-usea1-ext-ctl20250623064606752500000001')
# res = response['Body'].read().decode('utf-8')
paths = [obj['Key'] for obj in response['Contents']]

print(paths)

['dataset/', 'dataset/dataset/dataset-A.csv', 'dataset/dataset/dataset-B.csv', 'dataset/dataset/dataset-C.csv', 'dataset/dataset/dataset-D.csv', 'dataset/dataset/dataset-E.csv', 'dataset/dataset/dataset-F.csv', 'sample.txt']


### 2. Load data from neo4j

Connect to Neo4j instance running on TrueFoundry or any managed service

In [5]:
from neo4j import GraphDatabase

URI = "neo4j://neo4j.dev-ws.svc.cluster.local"
AUTH = ("neo4j", "51ce8fd4-1c9a-44e8-ad3b-b9d26ea6d074")

with GraphDatabase.driver(URI, auth=AUTH) as driver:
    driver.verify_connectivity()
    # Create graph
    # summary = driver.execute_query("""
    #     CREATE (a:Person {name: $name})
    #     CREATE (b:Person {name: $friendName})
    #     CREATE (a)-[:KNOWS]->(b)
    #     """,
    #     name="Alice", friendName="David",
    #     database_="neo4j",
    # ).summary
    # print("Created {nodes_created} nodes in {time} ms.".format(
    #     nodes_created=summary.counters.nodes_created,
    #     time=summary.result_available_after
    # ))
    # Query a graph
    records, summary, keys = driver.execute_query("""
        MATCH (p:Person)-[:KNOWS]->(:Person)
        RETURN p.name AS name
        """,
        database_="neo4j",
    )
    # Loop through results and do something with them
    for record in records:
        print(record.data())  # obtain record as dict
    # Summary information
    print("The query `{query}` returned {records_count} records in {time} ms.".format(
        query=summary.query, records_count=len(records),
        time=summary.result_available_after
    ))

{'name': 'Alice'}
{'name': 'Alice'}
{'name': 'Alice'}
The query `
        MATCH (p:Person)-[:KNOWS]->(:Person)
        RETURN p.name AS name
        ` returned 3 records in 2 ms.


### 3. Load data from Weaviate

Connect to Weaviate instance running on TrueFoundry or any managed service

In [15]:
import weaviate

Host = "weaviate.dev-ws.svc.cluster.local"
grpcHost = "weaviate-grpc.dev-ws.svc.cluster.local"

client = weaviate.connect_to_custom(
    http_host=Host,
    http_port=8080,
    http_secure=False,
    grpc_host=grpcHost,
    grpc_port=50051,
    grpc_secure=False,
)
# Create a new collection
# from weaviate.classes.config import Property, DataType

# # Note that you can use `client.collections.create_from_dict()` to create a collection from a v3-client-style JSON object
# client.collections.create(
#     "Article",
#     properties=[
#         Property(name="title", data_type=DataType.TEXT),
#         Property(name="body", data_type=DataType.TEXT),
#     ]
# )

# List all collections
resp = client.collections.list_all()
print(resp)
client.close()

{'Article': _CollectionConfigSimple(name='Article', description=None, generative_config=None, properties=[_Property(name='title', description=None, data_type=<DataType.TEXT: 'text'>, index_filterable=True, index_range_filters=False, index_searchable=True, nested_properties=None, tokenization=<Tokenization.WORD: 'word'>, vectorizer_config=None, vectorizer='none', vectorizer_configs=None), _Property(name='body', description=None, data_type=<DataType.TEXT: 'text'>, index_filterable=True, index_range_filters=False, index_searchable=True, nested_properties=None, tokenization=<Tokenization.WORD: 'word'>, vectorizer_config=None, vectorizer='none', vectorizer_configs=None)], references=[], reranker_config=None, vectorizer_config=None, vectorizer=<Vectorizers.NONE: 'none'>, vector_config=None)}


### 4. Load data from TrueFoundry ML Repository

In [25]:
# Download dataset from ML Repo

from truefoundry.ml import get_client
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning, module="truefoundry")
client = get_client()

# Log Artifact
# artifact_version = client.log_artifact(
#     ml_repo="bank-customer-churn",
#     name="locally-uploaded",
#     artifact_paths=[
#         # Add all your files or folders here
#         # The second element in the tuple is the destination path in the artifact relative to the root
#         ("./dataset-A.csv", None),
#     ],
# )
# print(f"Artifact version {artifact_version.fqn} created successfully")


# Get the artifact version directly
artifact_version = client.get_artifact_version_by_fqn("artifact:live-demo/bank-customer-churn/locally-uploaded:3")

# download it to disk
# `download_path` points to a directory that has all contents of the artifact
download_path = artifact_version.download(path=".", overwrite=True)

[truefoundry.ml] 2025-06-23T13:50:11+0000 INFO Downloading artifact version contents, this might take a while ...


Output()

[truefoundry.ml] 2025-06-23T13:50:11+0000 INFO Downloading dataset-A.csv to /home/jovyan/train-model/dataset-A.csv


# Step 1: Implement the training code

The first step is to create a job that trains a scikit learn model on iris dataset

We start with a `train.py` containing our training code and `requirements.txt` with our dependencies.

```
.
├── train.py
└── requirements.txt
```


### **`train.py`**


In [8]:
%%writefile train.py
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# load the dataset
X, y = load_iris(as_frame=True, return_X_y=True)
X = X.rename(columns={
        "sepal length (cm)": "sepal_length",
        "sepal width (cm)": "sepal_width",
        "petal length (cm)": "petal_length",
        "petal width (cm)": "petal_width",
})

# NOTE:- You can pass these configurations via command line
# arguments, config file, environment variables.
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Initialize the model
clf = LogisticRegression(solver="liblinear")
# Fit the model
clf.fit(X_train, y_train)

preds = clf.predict(X_test)
print(classification_report(y_true=y_test, y_pred=preds))

Overwriting train.py


Click on this [link](https://docs.truefoundry.com/v0.1.1/recipes/training-a-scikit-learn-model) to understand the **`app.py`**:


### **`requirements.txt`**


In [9]:
%%writefile requirements.txt
pandas==1.5.3
numpy==1.23.2
scikit-learn==1.5.0

Overwriting requirements.txt


# Step 2: Deploying as a Job

You can deploy services on TrueFoundry programmatically via our **Python SDK**.

Create the `deploy.py`, after which our file structure will look like this:

**File Structure**

```Text
.
├── train.py
├── deploy.py
└── requirements.txt
```

### **`deploy.py`**


In [10]:
%%writefile deploy.py
import argparse
import logging

from truefoundry.deploy import Build, Job, PythonBuild, LocalSource

logging.basicConfig(
    level=logging.INFO, format="%(asctime)s [%(name)s] %(levelname)-8s %(message)s"
)

parser = argparse.ArgumentParser()
parser.add_argument("--workspace_fqn", required=True, type=str)
args = parser.parse_args()

job = Job(
    name="iris-train-job",
    image=Build(
        build_source=LocalSource(local_build=False),
        build_spec=PythonBuild(
            python_version="3.11",
            command="python train.py",
            requirements_path="requirements.txt",
        )
    )
)
job.deploy(workspace_fqn=args.workspace_fqn, wait=False)

Overwriting deploy.py


Now to deploy our Job run the command below


In [11]:
!python deploy.py --workspace_fqn $WORKSPACE_FQN

2025-06-23 13:40:39,569 [truefoundry] INFO     Logged in to 'https://platform-admin.live-demo.truefoundry.cloud' as 'tfy-user'
2025-06-23 13:40:39,616 [truefoundry] INFO     Image will be built remotely because `image.build_source.local_build` is set to `false`. For faster builds it is recommended to install Docker locally and set `image.build_source.local_build` to `true` in your YAML spec or equivalently set `image=Build(build_source=LocalSource(local_build=True, ...))` in your `Service` or `Job` definition code.
2025-06-23 13:40:39,616 [truefoundry] INFO     Uploading code for job 'iris-train-job'
2025-06-23 13:40:39,617 [truefoundry] INFO     Archiving contents of dir: '/home/jovyan/train-model'
2025-06-23 13:40:39,770 [truefoundry] INFO     Neither `.tfyignore` file found in /home/jovyan/train-model nor a valid git repository found. We recommend you to create .tfyignore file and add file patterns to ignore
Packaging source code: 2it [00:00, 700.92it/s]
2025-06-23 13:40:39,777 [tru

In [24]:
# Check Git status
!git status

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   deploy_your_first_job.ipynb[m

no changes added to commit (use "git add" and/or "git commit -a")


In [23]:
# Commit to Git

!git add .
!git commit -m "init"

[master (root-commit) ccf35e7] init
 6 files changed, 1635 insertions(+)
 create mode 100644 .ipynb_checkpoints/deploy_your_first_job-checkpoint.ipynb
 create mode 100644 dataset-A.csv
 create mode 100644 deploy.py
 create mode 100644 deploy_your_first_job.ipynb
 create mode 100644 requirements.txt
 create mode 100644 train.py


In [5]:
# Push to your Repo
!git push origin main

Everything up-to-date
