## Scenario 3: Multiple data scientists working on multiple ML models

MLflow setup:
* Tracking server: yes, remote server (EC2). (aws name: mlflow-instance)
* Backend store: postgresql database (aws DB instance identifier: mlflow-database, table name: mlflow_db)
    * master username: Yezer_master
    * password: JWbB6uKWjvwH.MX*ru[Xsg7y$)Y? (from AWS cecret manager) 
    * endpoint: mlflow-database.chwwaqgyqw7h.eu-west-2.rds.amazonaws.com

* Artifacts store: s3 bucket. (aws name: mlflow-artifacts-remote-yezer)

The experiments can be explored by accessing the remote server.

The exampe uses AWS to host a remote server. In order to run the example you'll need an AWS account. Follow the steps described in the file `mlflow_on_aws.md` to create a new AWS account and launch the tracking server. 

The postgres db only will be reached my the tracking server EC2.

Command to start EC2 as a server and also to Postgres (we have to replace special characters like []?)? using percent encoding, E.g. ? -> %3F )
mlflow server -h 0.0.0.0 -p 5000 --backend-store-uri postgresql://Yezer_master:JWbB6uKWjvwH.MX*ru%5BXsg7y$%29Y%3F@mlflow-database.chwwaqgyqw7h.eu-west-2.rds.amazonaws.com:5432/mlflow_db --default-artifact-root s3://mlflow-artifacts-remote-yezer

*Maybe we have to have machines with more memmory, with the tutorial resources give me problems.

In [21]:
import mlflow
from dotenv import load_dotenv

load_dotenv()

# we can locate credentials in .aws folder local

TRACKING_SERVER_HOST = "ec2-18-133-195-112.eu-west-2.compute.amazonaws.com" # fill in with the public DNS of the EC2 instance
mlflow.set_tracking_uri(f"http://{TRACKING_SERVER_HOST}:5000")

In [22]:
print(f"tracking URI: '{mlflow.get_tracking_uri()}'")

tracking URI: 'http://ec2-18-133-195-112.eu-west-2.compute.amazonaws.com:5000'


In [23]:
mlflow.search_experiments() # list_experiments API has been removed, you can use search_experiments instead.()

[<Experiment: artifact_location='s3://mlflow-artifacts-remote-yezer/1', creation_time=1748004019978, experiment_id='1', last_update_time=1748004019978, lifecycle_stage='active', name='my-experiment-1', tags={}>,
 <Experiment: artifact_location='s3://mlflow-artifacts-remote-yezer/0', creation_time=1748003307149, experiment_id='0', last_update_time=1748003307149, lifecycle_stage='active', name='Default', tags={}>]

In [12]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

mlflow.set_experiment("my-experiment-1")

with mlflow.start_run():

    X, y = load_iris(return_X_y=True)

    params = {"C": 0.1, "random_state": 42}
    mlflow.log_params(params)

    lr = LogisticRegression(**params).fit(X, y)
    y_pred = lr.predict(X)
    mlflow.log_metric("accuracy", accuracy_score(y, y_pred))

    mlflow.sklearn.log_model(lr, artifact_path="models")
    print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")



default artifacts URI: 's3://mlflow-artifacts-remote-yezer/1/288883322190498bb6abd691ad0691ab/artifacts'
🏃 View run righteous-panda-174 at: http://ec2-18-133-195-112.eu-west-2.compute.amazonaws.com:5000/#/experiments/1/runs/288883322190498bb6abd691ad0691ab
🧪 View experiment at: http://ec2-18-133-195-112.eu-west-2.compute.amazonaws.com:5000/#/experiments/1


In [13]:
mlflow.search_experiments()

[<Experiment: artifact_location='s3://mlflow-artifacts-remote-yezer/1', creation_time=1748004019978, experiment_id='1', last_update_time=1748004019978, lifecycle_stage='active', name='my-experiment-1', tags={}>,
 <Experiment: artifact_location='s3://mlflow-artifacts-remote-yezer/0', creation_time=1748003307149, experiment_id='0', last_update_time=1748003307149, lifecycle_stage='active', name='Default', tags={}>]

### Interacting with the model registry

In [24]:
from mlflow.tracking import MlflowClient

client = MlflowClient(f"http://{TRACKING_SERVER_HOST}:5000")

In [28]:
from mlflow.entities import ViewType

runs = client.search_runs(
    experiment_ids="1",
    run_view_type=ViewType.ACTIVE_ONLY,
)
for run in runs:
    print(f"run id: {run.info.run_id}, {run.info.run_name}")

run id: 288883322190498bb6abd691ad0691ab, righteous-panda-174
run id: 78d280981c1942f287c9fe2c3f75e842, learned-kite-295


In [30]:
run_id = "78d280981c1942f287c9fe2c3f75e842"
mlflow.register_model(
    model_uri=f"runs:/{run_id}/models",
    name='my-experiment-1'
)

Successfully registered model 'my-experiment-1'.
2025/05/23 14:10:21 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: my-experiment-1, version 1
Created version '1' of model 'my-experiment-1'.


<ModelVersion: aliases=[], creation_timestamp=1748005821150, current_stage='None', description='', last_updated_timestamp=1748005821150, name='my-experiment-1', run_id='78d280981c1942f287c9fe2c3f75e842', run_link='', source='s3://mlflow-artifacts-remote-yezer/1/78d280981c1942f287c9fe2c3f75e842/artifacts/models', status='READY', status_message=None, tags={}, user_id='', version='1'>