# Find Pipelines with Available Updates

This script will walk you through the API calls needed to determine which pipelines, configured in a given TDP Org, have a protocol with a newver artifact version available for use.

This script uses the following python modules

- `requests` - for handling TDP API calls
- `pandas` - for data processing

In [1]:
import json

import requests
import pandas as pd

### Create the authorization object

This `auth` object is used to handle authorization for requests.  It is required for (and common to) any API usage in TDP. Therefore it is commonly recommended to save this as a `.json` file for future use.

In [2]:
auth = {
    "api_url": "https://api.tetrascience.com/v1/",
    "auth_token": "",
    "org": "tetrascience",
    "verify_ssl": True,
}

In [3]:
headers = {
    "ts-auth-token": auth["auth_token"],
    "x-org-slug": auth["org"],
}

In [4]:
pipelines_endpoint = "pipeline/search"
url = auth["api_url"] + pipelines_endpoint

#### Suppress Warnings

If SSL verificaiton is contraindicated for the TDP environment, the below loop can get very noisy without suppressing warnings.

In [5]:
# Suppresss HTTPS Verification warnings
requests.packages.urllib3.disable_warnings()

### Get information for all piplines via Loop

The piplines enpoint returns only 10 pipelines at a time.  We'll use the returned `"hasNext"` and `"from"` values to loop through the pages of results and capture the information on all the pipelines.

In [6]:
hits = []

r = {
    "hasNext": True,
    "from": 0,
}


while r["hasNext"]:
    
    r = requests.get(
        url,
        params={"from": r["from"]},
        headers=headers,
        verify=auth["verify_ssl"]
    ).json()
    
    hits += r["hits"]
    
len(hits)

65

### Subest on relevant info

The Pipelines enpoint returns a lot of information. We'll subset to only get the info we need.

- `id` - Useful for future API calls to manage the pipeline
- `name` - The name assigned to the pipeline when it was created
- `masterScriptNamespace` - `common`, `client` or `private` depending on how the protocol was deployed
- `masterScriptSlug` - The internal unique name for the protocol
- `masterScriptVersion` - The version we can compare to the latest versions
- `updatedAt` - The last time this pipeline was updated/managed
- `status` - `null` = enabled, `"disabled"` = disabled
    - We'll fill in these `null` values with `"enabled"` to make the status explicit

We'll also rename some of these columns to:

1. Better reflect their function (masterScript is an old nanme for protocol)
2. Avoid confusion with artifact information we'll be extracting next

In [7]:
PIPELINE_FIELDS = [
    "id",
    "name",
    "masterScriptNamespace",
    "masterScriptSlug",
    "masterScriptVersion",
    "updatedAt",
    "status",
]

RENAME_FIELDS = {
    "name": "pipelineName",
    "masterScriptNamespace": "namespace",
    "masterScriptSlug": "slug",
    "masterScriptVersion": "version",
    "updatedAt": "pipelineLastUpdate",
}

In [8]:
pipeline_df = pd.json_normalize(hits, max_level=1)[PIPELINE_FIELDS].rename(columns=RENAME_FIELDS)
pipeline_df["status"] = pipeline_df["status"].fillna("enabled")
pipeline_df.head(2)

Unnamed: 0,id,pipelineName,namespace,slug,version,pipelineLastUpdate,status
0,0a446067-ddc9-47e9-8e20-019b718dd28f,DE-6927-empower-raw-to-ids,common,empower-raw-to-ids,v8.1.1,2024-03-07T05:24:35.743Z,enabled
1,a632c004-4671-46f8-a1c3-b772b17e8ed6,DE-7271-empower-raw-to-ids,common,empower-raw-to-ids,v8.1.1,2024-03-07T05:24:19.153Z,enabled


## Get Protocol Artifact information

We'll next use the `artricats/protcols` enpoint to get the most recent versioning for all protocols.

This can be compared versions currently deployed to see which ones are eligible for update.

In [9]:
protocols_endpoint = "artifacts/protocols"
url = auth["api_url"] + protocols_endpoint

In [10]:
latest_protocols = requests.get(
    url=url, 
    params={"latest_only": "true"}, 
    headers=headers,
    verify=auth["verify_ssl"],
).json()

In [11]:
len(latest_protocols)

482

### Subset on needed info

The Protocols endpoint also returns a lot of information. We'll again subset to only get the info we need.

- `namespace` - `common`, `client` or `private` depending on how the protocol was deployed
- `slug` - The internal unique name for the protocol
- `version` - The version we can compare to the latest versions
- `name` - The display title for the protocol
- `description` - Additional information about the function of the protocol

In [12]:
PROTOCOL_COLUMNS = [
    "namespace",
    "slug",
    "version",
    "name",
    "description",
]

In [13]:
protocol_df = pd.json_normalize(latest_protocols)[PROTOCOL_COLUMNS]

In [14]:
protocol_df.head(2)

Unnamed: 0,namespace,slug,version,name,description
0,client-merck,create-empower-ssm-and-send-to-agent,v0.1.0,Create Empower SSM And Send To Agent,Create a SSM to send to Empower
1,common,thermofisher-xcalibur-raw-to-ids,v4.0.3,Thermo Fisher Xcalibur Raw To IDS,Convert Thermo Fisher Xcalibur RAW files to IDS


### Join the tables

We can now perform a left join on the existing pipelines with the protocol information, to add the existing latest version info. We do this by joining where the `slug` and `namespace` values are the same between both tables.

Because there are `version` columns in both tables, we'll need to add a suffix to each to help better distinquish these versions.  We've chosen `deployed` for the version used in an existing pipeline, and `latest_protocol` for the latest version we found in the `artifacts` endpoint.

In [15]:
result = pd.merge(
    pipeline_df,
    protocol_df,
    how="left",
    on=["namespace", "slug"],
    suffixes=("_deployed", "_latest_protocol"),
)
result.head(2)

Unnamed: 0,id,pipelineName,namespace,slug,version_deployed,pipelineLastUpdate,status,version_latest_protocol,name,description
0,0a446067-ddc9-47e9-8e20-019b718dd28f,DE-6927-empower-raw-to-ids,common,empower-raw-to-ids,v8.1.1,2024-03-07T05:24:35.743Z,enabled,v9.0.0,Empower Raw to IDS Protocol,This protocol parses JSON files produced by Em...
1,a632c004-4671-46f8-a1c3-b772b17e8ed6,DE-7271-empower-raw-to-ids,common,empower-raw-to-ids,v8.1.1,2024-03-07T05:24:19.153Z,enabled,v9.0.0,Empower Raw to IDS Protocol,This protocol parses JSON files produced by Em...


### Filter, reorder, and save

Finally we'll query for only rows where the deployed version is not the same as the latest version. We'll then re-order the columns to one that makes a bit more sense for visual inspection.  Then we'll save the output as a CSV for futher inspection and future use.

In [16]:
DISPLAY_COLUMNS = [
    "id",
    "pipelineName",
    "namespace",
    "slug",
    "version_deployed",
    "version_latest_protocol",
    "name",
    "description",
    "status",
    "pipelineLastUpdate",
]


filtered_result = result.query("version_deployed != version_latest_protocol")[DISPLAY_COLUMNS]
filtered_result.head(5)

Unnamed: 0,id,pipelineName,namespace,slug,version_deployed,version_latest_protocol,name,description,status,pipelineLastUpdate
0,0a446067-ddc9-47e9-8e20-019b718dd28f,DE-6927-empower-raw-to-ids,common,empower-raw-to-ids,v8.1.1,v9.0.0,Empower Raw to IDS Protocol,This protocol parses JSON files produced by Em...,enabled,2024-03-07T05:24:35.743Z
1,a632c004-4671-46f8-a1c3-b772b17e8ed6,DE-7271-empower-raw-to-ids,common,empower-raw-to-ids,v8.1.1,v9.0.0,Empower Raw to IDS Protocol,This protocol parses JSON files produced by Em...,enabled,2024-03-07T05:24:19.153Z
2,fdda9dcf-94b6-4eea-b67c-92dad17de803,Empower Protocol v8.0.1 Test,common,empower-raw-to-ids,v8.0.1,v9.0.0,Empower Raw to IDS Protocol,This protocol parses JSON files produced by Em...,enabled,2023-09-12T17:16:03.114Z
4,647c49db-8a58-4847-9cc9-99f0ee43f2fd,tmp yury test,client-diagnostic,all-in-one,v1.1.0,v2.1.0,All-In-One Diagnostic,A set of tests for a periodical diagnostic,enabled,2023-08-29T13:51:43.847Z
5,f2fd26ee-71be-405e-a4be-af6a2bce392b,Agilent Chemstation Raw .D to IDS,common,agilent-chemstation-raw-to-ids,v5.0.0,v5.0.1,Agilent Chemstation Raw to IDS,This is a protocol which parses data acquired ...,enabled,2023-08-15T15:58:15.787Z


In [17]:
filtered_result.to_csv("pipelines_with_available_updates.csv", index=False)