# Use `ibm-watsonx-ai` to extract data about historical AutoAI experiments and deployments stored in spaces

This notebook contains the steps and code to extract the info about any available historical AutoAI experiment runs as well as deployments in watsonx.ai Runtime service. It contains steps and code to work with [ibm-watsonx-ai](https://pypi.python.org/pypi/ibm-watsonx-ai) library available in PyPI repository.

Some familiarity with Python is helpful. This notebook uses Python 3.11.


## Learning goals

The learning goals of this notebook are:

-  Work with watsonx.ai Runtime to extract info about past AutoAI experiments
-  Work with watsonx.ai Runtime to extract info about AutoAI models that are deployed into spaces
-  Store the desired data into CSV's for further analysis

## Contents

This notebook contains the following parts:

1.	[Installing and importing the ibm-watsonx-ai and dependecies](#setup)
2.	[Connecting to watsonx.ai Runtime](#client-setup)
3.	[Getting available projects and spaces](#get-projects-spaces)
4.	[Getting all available historical trainings](#get-trainings)
5.	[Getting all available deployments info](#get-deployments)
6.  [Extracting trainings info from all available projects](#extract-projects)
7.  [Extracting trainings and deployments info from all available spaces](#extract-spaces)
8.	[Displaying the results](#results)
9.  [Exporting to file](#exports)
10. [Cleanup](#cleanup)
11. [Summary and next steps](#summary)

<a id="setup"></a>
## 1. Installing and importing the `ibm-watsonx-ai` and dependecies 
**Note:** `ibm-watsonx-ai` documentation can be found <a href="https://ibm.github.io/watsonx-ai-python-sdk/index.html" target="_blank" rel="noopener no referrer">here</a>.

In [1]:
!pip install ibm-watsonx-ai | tail -n 1



In [1]:
import json
import os
import getpass
import requests

from IPython.display import display
import pandas as pd

from dateutil import parser

from ibm_watsonx_ai import Credentials, APIClient
from ibm_watsonx_ai.wml_client_error import ApiRequestFailure
from ibm_watsonx_ai.utils import create_download_link

<a id="client-connection"></a>
## 2. Connecting to watsonx.ai Runtime 

Authenticate the watsonx.ai Runtime service on IBM Cloud. You need to provide Cloud `API key` and `location`.

**Tip**: Your `Cloud API key` can be generated by going to the [**Users** section of the Cloud console](https://cloud.ibm.com/iam#/users). From that page, click your name, scroll down to the **API Keys** section, and click **Create an IBM Cloud API key**. Give your key a name and click **Create**, then copy the created key and paste it below. You can also get a service specific url by going to the [**Endpoint URLs** section of the watsonx.ai Runtime docs](https://cloud.ibm.com/apidocs/machine-learning).  You can check your instance location in your  <a href="https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/" target="_blank" rel="noopener no referrer">watsonx.ai Runtime Service</a> instance details.


You can use [IBM Cloud CLI](https://cloud.ibm.com/docs/cli/index.html) to retrieve the instance `location`.

```
ibmcloud login --apikey API_KEY -a https://cloud.ibm.com
ibmcloud resource service-instance INSTANCE_NAME
```


**NOTE:** You can also get a service specific apikey by going to the [**Service IDs** section of the Cloud Console](https://cloud.ibm.com/iam/serviceids).  From that page, click **Create**, and then copy the created key and paste it in the following cell.


**Action**: Enter your `api_key` and `location` in the following cells.

In [2]:
api_key = getpass.getpass("Insert your API key (hit enter): ...")

Insert your API key (hit enter): ... ········


In [None]:
location = "INSERT YOUR LOCATION HERE"
url = f"https://{location}.ml.cloud.ibm.com"

In [None]:
credentials = Credentials(
    api_key=api_key,
    url=url
)

In [5]:
api_client = APIClient(credentials=credentials)

<a id="get-projects-spaces"></a>
## 3. Getting available projects and spaces 

Getting projects that can be accessed direactly via API requests, once it's supporten in the SDK, it will be updated. 

In [6]:
projects_url = f"{api_client.PLATFORM_URL}/v2/projects"
params = api_client._params(skip_for_create=True)
params["limit"] = 100
response = requests.get(url=projects_url, params=params, headers=api_client._get_headers())
available_projects = [resource.get("metadata", {}).get("guid") for resource in response.json().get("resources", [])]

In [7]:
print(f"Available projects: [{', '.join(available_projects)}]")

Available projects: [31610097-87bb-45aa-ab76-2aee29b5857f, dbc99479-c4ac-46c1-90b1-1d7a7749587d, eac8bfe2-a00b-43ca-846b-305af5cc6395]


Getting spaces via SDK. 

In [8]:
available_spaces = api_client.spaces.list()["ID"].to_list()

In [9]:
print(f"Available spaces: [{', '.join(available_spaces)}]")

Available spaces: [9f44cc2b-b3d0-4472-824e-4941afb1617b, 7ba02c5f-a50a-4105-b9c3-2fdb54fe1829, d68da17a-ab98-44fa-b2e1-a21ab1b76058]


Introducing a method for setting the client's scope - either a space, or a project. 

In [10]:
def set_scope(client, scope_name, scope_id):
    if scope_name == "project": 
        client.set.default_project(scope_id)
    else:
        client.set.default_space(scope_id)
    print(f"Working on {scope_name}_id: {scope_id}")

<a id="get-trainings"></a>
## 4. Getting all available historical trainings 

Training can be executed in a project or in space. Below methods introduce a mechanism to extract the desired data from the training service instance. 

In [11]:
training_results = pd.DataFrame()

In [15]:
def is_autoai(metrics): 
    return any("ml_metrics" in i or "ts_metrics" in i for i in metrics)

def get_training_info(api_client):
    training_service = api_client.training
    if (scope_id := api_client.default_project_id) is not None:
        scope = "project"
    else:
        scope = "space"
        scope_id = api_client.default_space_id
    trainings = training_service.list(get_all=True)
    trainings_ids_list = trainings["ID (training)"].to_list()
    info = []
    
    for training in trainings_ids_list: 
        details = training_service.get_details(training)
        metadata = details.get("metadata", {})
        created_at = parser.parse(metadata.get("created_at"))
        status = details.get("entity", {}).get("status", {})
        metrics = status.get("metrics", [])
        if (state:=status.get("state")) == "completed" and is_autoai(metrics): 
            completed_at = parser.parse(status.get("completed_at"))
            # collecting only finished trainings 
            info.append({
                    "ID (training)": training,
                    "Created at": created_at,
                    "Finished at": completed_at,
                    "Status": state,
                    "Took": completed_at - created_at,
                    "Scope": scope,
                    "Scope ID": scope_id
            })
            print(".", end="")
    print()
    return pd.DataFrame(info)
    

<a id="get-deployments"></a>
## 5. Getting all available deployments info 

Deployments can be stored only in space. Below methods introduce a mechanism to extract the desired data from the deployment.  

In [16]:
def get_model_deployments_ids(client):
    deployments = client.deployments.list()
    return deployments[deployments["ARTIFACT_TYPE"] == "model"]["ID"]

def get_model_details(client, deployment_id):
    asset_id = client.deployments.get_details(deployment_id).get("entity", {}).get("asset", {}).get("id")
    return client.data_assets.get_details(asset_id)

def is_autoai_pipeline(deployment_details):
    return deployment_details.get("metadata", {}).get("asset_type") == "wml_model"

def extract_details(deployment_details):
    space_id = deployment_details.get("metadata", {}).get("space_id")
    wml_model = deployment_details.get("entity", {}).get("wml_model", {})
    training_id = wml_model.get("training_id")
    pipeline_id = wml_model.get("pipeline", {}).get("id")
    metrics = wml_model.get("metrics", [])[0]
    model = metrics.get("context", {}).get("intermediate_model", {})
    pipeline_steps = f'[{", ".join(model.get("composition_steps", []))}]'
    pipeline_nodes = f'[{", ".join(model.get("pipeline_nodes", []))}]'
    print(".", end="")
    return {
            "Scope": "space",
            "Scope ID": space_id,
            "ID (pipeline)": pipeline_id,
            "ID (training)": training_id,
            "Pipeline steps": pipeline_steps,
            "Pipeline nodes": pipeline_nodes
            
        }

<a id="extract-projects"></a>
## 6. Extracting trainings info from all available projects 

In [17]:
for project_id in available_projects:
    set_scope(api_client, "project", project_id)
    training_results = pd.concat([training_results, get_training_info(api_client)])

Working on project_id: 31610097-87bb-45aa-ab76-2aee29b5857f
........................................
Working on project_id: dbc99479-c4ac-46c1-90b1-1d7a7749587d
...
Working on project_id: eac8bfe2-a00b-43ca-846b-305af5cc6395
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

<a id="extract-spaces"></a>
## 7. Extracting trainings and deployments info from all available spaces 

In [None]:
models_list = []
for space_id in available_spaces:
    try:
        set_scope(api_client, "space", space_id)
        training_results = pd.concat([training_results, get_training_info(api_client)])
        deployments_list = get_model_deployments_ids(api_client)
        for deployment_id in deployments_list:
            model_details = get_model_details(api_client, deployment_id)
            if is_autoai_pipeline(model_details):
                # collecting only AutoAI models
                models_list.append(extract_details(model_details))
    except Exception as e:
        print(f"Error for space {space_id}: {e}")
    print()
deployments_df = pd.DataFrame(models_list)

 <a id="results"></a>
## 8. Displaying the results

In [19]:
training_results

Unnamed: 0,ID (training),Created at,Finished at,Status,Took,Scope,Scope ID
0,01e05c71-d40f-4eee-b7bf-b2c94db82d0e,2025-02-03 15:03:18.322000+00:00,2025-02-03 15:07:21.003000+00:00,completed,0 days 00:04:02.681000,project,31610097-87bb-45aa-ab76-2aee29b5857f
1,ad636b5d-2959-4c59-8878-48b838fbe357,2024-11-14 15:10:23.238000+00:00,2024-11-14 15:19:01.076000+00:00,completed,0 days 00:08:37.838000,project,31610097-87bb-45aa-ab76-2aee29b5857f
2,fd20811b-7776-4b4b-97ef-693d648d5ef9,2024-11-14 15:08:30.573000+00:00,2024-11-14 15:16:01.310000+00:00,completed,0 days 00:07:30.737000,project,31610097-87bb-45aa-ab76-2aee29b5857f
3,80a7d894-ce18-45ed-8de9-c185d5d4e1f7,2024-11-14 12:36:51.428000+00:00,2024-11-14 12:46:22.450000+00:00,completed,0 days 00:09:31.022000,project,31610097-87bb-45aa-ab76-2aee29b5857f
4,dacc4d32-b4b7-463d-a810-cf6a67b0501b,2024-11-14 11:45:51.018000+00:00,2024-11-14 11:56:26.961000+00:00,completed,0 days 00:10:35.943000,project,31610097-87bb-45aa-ab76-2aee29b5857f
...,...,...,...,...,...,...,...
3209,cefd36da-75e6-4171-8889-b997900db6a2,2023-11-23 12:22:54.510000+00:00,2023-11-23 12:38:50.699000+00:00,completed,0 days 00:15:56.189000,project,eac8bfe2-a00b-43ca-846b-305af5cc6395
3210,6d3d86c5-2fd1-4df4-bb16-035f0ba9b1ea,2023-11-17 15:34:41.270000+00:00,2023-11-17 15:48:03.390000+00:00,completed,0 days 00:13:22.120000,project,eac8bfe2-a00b-43ca-846b-305af5cc6395
3211,4f4c933d-ce1d-4c75-9cb9-9cfe616f8f31,2023-11-17 15:19:20.356000+00:00,2023-11-17 15:25:23.803000+00:00,completed,0 days 00:06:03.447000,project,eac8bfe2-a00b-43ca-846b-305af5cc6395
3212,2f1e079d-081b-4696-8a5a-f51544787b19,2023-11-17 15:15:16.718000+00:00,2023-11-17 15:17:27.770000+00:00,completed,0 days 00:02:11.052000,project,eac8bfe2-a00b-43ca-846b-305af5cc6395


In [20]:
deployments_df

Unnamed: 0,Scope,Scope ID,ID (pipeline),ID (training),Pipeline steps,Pipeline nodes
0,space,9f44cc2b-b3d0-4472-824e-4941afb1617b,e10eaf24-8d9d-4aba-b120-53c57c791522,c450a7aa-0091-486c-997c-6628f8630c0a,[Linear],"[Linear, BATS]"
1,space,9f44cc2b-b3d0-4472-824e-4941afb1617b,b32b1334-33d2-4e6e-bbcd-7ed75a8f1f56,bfd79331-9d8d-40db-b592-e64f6cc0bd4c,"[Split_TrainingHoldout, TrainingDataset_full_8...","[TextTransformer, PreprocessingTransformer, au..."
2,space,9f44cc2b-b3d0-4472-824e-4941afb1617b,b32b1334-33d2-4e6e-bbcd-7ed75a8f1f56,bfd79331-9d8d-40db-b592-e64f6cc0bd4c,"[Split_TrainingHoldout, TrainingDataset_full_8...","[TextTransformer, PreprocessingTransformer, au..."
3,space,9f44cc2b-b3d0-4472-824e-4941afb1617b,08fa5a22-e1b2-4983-a7b2-41a2f5234090,865234ef-2319-472d-a824-070db04331a7,"[Split_TrainingHoldout, TrainingDataset_full_1...","[PreprocessingTransformer, XGBRegressor]"
4,space,9f44cc2b-b3d0-4472-824e-4941afb1617b,08fa5a22-e1b2-4983-a7b2-41a2f5234090,865234ef-2319-472d-a824-070db04331a7,"[Split_TrainingHoldout, TrainingDataset_full_1...","[PreprocessingTransformer, XGBRegressor]"
5,space,7ba02c5f-a50a-4105-b9c3-2fdb54fe1829,73b12405-bc11-4d83-aa5a-79bd06a6acbe,8718141d-e6f0-49d5-8c6b-7c4e6dea01e8,"[Split_TrainingHoldout, TrainingDataset_full_6...","[ColumnSelector, DateTransformer, Preprocessin..."
6,space,7ba02c5f-a50a-4105-b9c3-2fdb54fe1829,27284b71-a88c-440d-969f-3931d5c15992,8dfce72a-7706-40ab-9a06-c170610cc0f7,"[Split_TrainingHoldout, TrainingDataset_full_6...","[ColumnSelector, DateTransformer, Preprocessin..."
7,space,7ba02c5f-a50a-4105-b9c3-2fdb54fe1829,73b12405-bc11-4d83-aa5a-79bd06a6acbe,8718141d-e6f0-49d5-8c6b-7c4e6dea01e8,"[Split_TrainingHoldout, TrainingDataset_full_6...","[ColumnSelector, DateTransformer, Preprocessin..."


<a id="export"></a>
## 9. Exporting results to file 

In [21]:
trainings_csv = "trainings.csv"
deployments_csv = "deployments.csv"

In [22]:
training_results.to_csv(trainings_csv, index=False)
deployments_df.to_csv(deployments_csv, index=False)

### Export data from cloud (optional)

If you are running this notebook on cloud, execute this cell in order to download the saved results

In [23]:
display(create_download_link(trainings_csv, title=f"Download {trainings_csv}"))
display(create_download_link(deployments_csv, title=f"Download {deployments_csv}"))

<a id="cleanup"></a>
## 10. Cleanup 

If you want to clean up all created assets:
- experiments
- trainings
- pipelines
- model definitions
- models
- functions
- deployments

please follow up this sample [notebook](https://github.com/IBM/watson-machine-learning-samples/blob/master/cloud/notebooks/python_sdk/instance-management/Machine%20Learning%20artifacts%20management.ipynb).

<a id="summary"></a>
## 11. Summary and next steps 

You successfully completed this notebook! You learned how to use scikit-learn machine learning as well as watsonx.ai Runtime for model creation and deployment. Check out our _[Online Documentation](https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/welcome-main.html?context=wx)_ for more samples, tutorials, documentation, how-tos, and blog posts. 

### Authors

**Marta Tomzik**, Software Engineer at watsonx.ai.

Copyright © 2025 IBM. This notebook and its source code are released under the terms of the MIT License.