# Gateways 2024 Hands-on Exercises

In this notebook, you will use Tapis v3 to create two systems and one application that will be used to run
a sentiment analysis job on both a VM using docker and an HPC type host using singularity.

To execute each `In[#]` cell, you can click inside the cell and press `Shift + Enter`

Install Tapis Python SDK.  After running the code below you need to restart the runtime - go to the Menu and select Runtime -> Restart runtime or use CTRL+M on the keyboard. Now you can execute the code in the notebook and follow the rest of the tutorial.

In [None]:
#!pip install tapipy

## Enter training account information

To get things started, please run the following and enter the training account information provided to you:

In [78]:
import getpass

tenant = 'tacc'
base_url = 'https://' + tenant + '.tapis.io'

# Enter Tapis Username.
# Enter your tacc credentials
username = input('Username: ')
password = getpass.getpass(prompt='Permitted Password: ', stream=None)

Username:  
Permitted Password:  ········


## Authenticate and initialize Tapis v3 client

Using this information, you can now use `tapipy` to authenticate in the tenant and initialize the
Tapis v3 client. You should see your token information displayed. This may take a while to run but should take
no more than 30 seconds.

In [79]:
from tapipy.tapis import Tapis
#Create python Tapis client for user
client = Tapis(base_url= base_url, username=username, password=password)
# *** Tapis v3: Call to Tokens API
client.get_tokens()
# Print Tapis v3 token
client.access_token

## Systems

In this section we create two Tapis systems, one for running on a VM host using FORK and one for running on an HPC type host using BATCH.

Note that although it is possible, we have not provided any login credentials in the system definitions.
Well-crafted system definitions are likely to be copied and re-used, so, for security reasons, it is recommended that
login credentials be registered using separate API calls as discussed below.

### Create a system for the VM host

In [81]:
# Enter VM Host IP password shared with you 
host = input('Host: ')
#if you are creating password authentication based system, uncomment the below line
#password_vm = getpass.getpass(prompt='Password for VM: ', stream=None)

Host:  


In [84]:
# for PKI_KEYS type authentication, register your public and private key
# Note that they should be formatted in one line
# Read the documenation https://tapis.readthedocs.io/en/latest/technical/systems.html#use-of-pki-keys-as-credentials

privKey=""

In [85]:
pubKey=""

In [30]:
# use your tacc username
user_id = "spadhy"
num = 1

system_id_vm = "lccfuser-testvm-"+ user_id + str(num)
print(system_id_vm)

# Create the system definition
# exec_system_vm = {
#   "id": system_id_vm,
#   "description": "Test system",
#   "systemType": "LINUX",
#   "host": host,
#   "effectiveUserId":"${apiUserId}",
#   "defaultAuthnMethod": "PASSWORD",
#   "rootDir": "/",
#   "canExec": True,
#   "jobRuntimes": [ { "runtimeType": "DOCKER" } ],
#   "jobWorkingDir": "HOST_EVAL($HOME)/sharetest/workdir"
# }

exec_system_vm = {
  "id": system_id_vm,
  "description": "Test system",
  "systemType": "LINUX",
  "host": host,
  "effectiveUserId":"lccfuser",
  "defaultAuthnMethod": "PKI_KEYS",
  "rootDir": "/home/lccfuser",
  "canExec": True,
  "jobRuntimes": [ { "runtimeType": "DOCKER" } ],
  "jobWorkingDir": "HOST_EVAL($HOME)/sharetest/workdir"
  }

# Use the client to create the system in Tapis
print("****************************************************")
print("Create system: " + system_id_vm)
print("****************************************************")
client.systems.createSystem(**exec_system_vm)


lccfuser-testvm-spadhy1
****************************************************
Create system: lccfuser-testvm-spadhy1
****************************************************



url: http://tacc.tapis.io/v3/systems/lccfuser-testvm-spadhy1

In [31]:
# You can also update just a few attributes using the patchSystem call.
# Note that not all attributes may be updated and some attributes, such as *enabled*,
#   may only be updated using a specific call.
# For example, to update the description, first define the json to be used:
patch_system_vm = {
  "description": "System for running jobs on a VM for LCCF tutorial"
}

# Then use the client to make the update:
client.systems.patchSystem(**patch_system_vm, systemId=system_id_vm)


url: http://tacc.tapis.io/v3/systems/lccfuser-testvm-spadhy1

In [32]:
# List all systems available to you
print("****************************************************")
print("List all systems")
print("****************************************************")
#client.systems.getSystems()

****************************************************
List all systems
****************************************************


### Register Credentials for the VM system

After creating the system, you will need to register credentials for your username. These will be used by Tapis to
access the host. Various authentication methods can be used to access a system, such as PASSWORD and PKI_KEYS. For the
VM a password is used.

In [33]:
# Register credentials
#client.systems.createUserCredential(systemId=system_id_vm, userName=user_id, password=password_vm)
client.systems.createUserCredential(systemId=system_id_vm, userName="lccfuser", privateKey=privKey,publicKey=pubKey)

{'result': None,
 'status': 'success',
 'message': 'SYSAPI_CRED_UPDATED Credential updated. jwtTenant: tacc jwtUser: spadhy OboTenant: tacc OboUser: spadhy System: lccfuser-testvm-spadhy1 User: lccfuser',
 'version': '1.8.1',
 'commit': '648f675c',
 'build': '2025-01-09T21:49:21Z',
 'metadata': None}

Now you can use the client to list files on the system. This will confirm that the credentials are valid.

In [34]:
client.files.listFiles(systemId=system_id_vm,path='/')


[
 group: 1014
 lastModified: 2025-02-06T19:46:09Z
 mimeType: None
 name: .bash_history
 nativePermissions: rw-------
 owner: 1014
 path: .bash_history
 size: 11062
 type: file
 url: tapis://lccfuser-testvm-spadhy1/.bash_history,
 
 group: 1014
 lastModified: 2024-02-10T12:29:44Z
 mimeType: None
 name: .bash_logout
 nativePermissions: rw-r--r--
 owner: 1014
 path: .bash_logout
 size: 18
 type: file
 url: tapis://lccfuser-testvm-spadhy1/.bash_logout,
 
 group: 1014
 lastModified: 2024-02-10T12:29:44Z
 mimeType: None
 name: .bash_profile
 nativePermissions: rw-r--r--
 owner: 1014
 path: .bash_profile
 size: 141
 type: file
 url: tapis://lccfuser-testvm-spadhy1/.bash_profile,
 
 group: 1014
 lastModified: 2024-02-10T12:29:44Z
 mimeType: None
 name: .bashrc
 nativePermissions: rw-r--r--
 owner: 1014
 path: .bashrc
 size: 376
 type: file
 url: tapis://lccfuser-testvm-spadhy1/.bashrc,
 
 group: 1014
 lastModified: 2025-01-24T15:39:53Z
 mimeType: None
 name: .docker
 nativePermissions: rwx---

### Natural Language Processsing: Sentiment Analysis
- Sentiment Analysis is one of the most popular applications of Natural Language Processing, which uses the Text Classification method to analyse the sentiment or emotion of the given text.
- Sentiment analysis assigns a label like 🙂 positive, 🙁 negative, or 😐 neutral to a sequence of text.
- It is useful tool to make business decisions based on customer feedback and reviews.



In [34]:
#!pip install -q transformers

In [35]:
from transformers import pipeline

In [36]:
sentiment_pipeline = pipeline("sentiment-analysis")
#pipeline(model="FacebookAI/roberta-large-mnli")
#

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [37]:
text= "Glad to see you at Gateways24"
sentiment_pipeline(text)

[{'label': 'POSITIVE', 'score': 0.9997883439064026}]

### Let's try this with Tapis now

In [40]:
app_id = "lccfuser-sentiment-analysis-" + username
app_def= {
    "id": app_id,
    "version": "0.1",
    "description": "Application utilizing the sentiment analysis model from Hugging Face.",
    "jobType": "FORK",
    "runtime": "DOCKER",
    "containerImage": "tapis/sentiment-analysis:1.0.2",
    "jobAttributes": {
        "parameterSet": {
            "archiveFilter": {
                "includeLaunchFiles": False
            }
        },
        "memoryMB": 1,
        "nodeCount": 1,
        "coresPerNode": 1,
        "maxMinutes": 10
    }
}

In [41]:
client.apps.createAppVersion(**app_def)

#To update the app
#client.apps.patchApp(appId=app_id, appVersion='0.2', **app_def)


url: http://tacc.tapis.io/v3/apps/lccfuser-sentiment-analysis-spadhy

In [106]:
client.apps.getAppLatestVersion(appId=app_id)


containerImage: tapis/sentiment-analysis:1.0.1
created: 2025-01-23T21:46:49.883970Z
deleted: False
description: Application utilizing the sentiment analysis model from Hugging Face.
enabled: True
id: gateways24-sentiment-analysis-spadhy
isPublic: False
jobAttributes: 
archiveOnAppError: False
archiveSystemDir: None
archiveSystemId: None
cmdPrefix: None
coresPerNode: 1
description: None
dtnSystemInputDir: !tapis_not_set
dtnSystemOutputDir: !tapis_not_set
dynamicExecSystem: False
execSystemConstraints: None
execSystemExecDir: None
execSystemId: None
execSystemInputDir: None
execSystemLogicalQueue: None
execSystemOutputDir: None
fileInputArrays: []
fileInputs: []
isMpi: False
maxMinutes: 10
memoryMB: 1
mpiCmd: None
nodeCount: 1
parameterSet: 
appArgs: []
archiveFilter: 
excludes: []
includeLaunchFiles: False
includes: []
containerArgs: []
envVariables: []
logConfig: 
stderrFilename: 
stdoutFilename: 
schedulerOptions: []
subscriptions: []
tags: []
jobType: FORK
locked: False
maxJobs: 214

In [42]:
#Submit job to run the sentiment analysis application
pa= {
    "parameterSet": {
    "appArgs": [
            {"arg": "--sentences"},
            {"arg": "\"This is great\" \"This is not fun\""}
            
        ]
    }}

# Submit a job
job_response_vm=client.jobs.submitJob(name='sentiment analysis',description='sentiment analysis with hugging face transformer pipelines',appId=app_id,appVersion='0.1',execSystemId=system_id_vm, **pa)


In [43]:
# Get Job submission response
print("****************************************************")
print("Job Submitted: " + app_id)
print("****************************************************")
print(job_response_vm)

****************************************************
Job Submitted: lccfuser-sentiment-analysis-spadhy
****************************************************

_fileInputsSpec: None
_parameterSetModel: None
appId: lccfuser-sentiment-analysis-spadhy
appVersion: 0.1
archiveCorrelationId: None
archiveOnAppError: False
archiveSystemDir: /home/lccfuser/sharetest/workdir/jobs/22358a59-fa58-4f65-847e-e55b31364511-007/output
archiveSystemId: lccfuser-testvm-spadhy1
archiveTransactionId: None
blockedCount: 0
cmdPrefix: None
condition: None
coresPerNode: 1
created: 2025-02-06T20:15:14.581675934Z
createdby: spadhy
createdbyTenant: tacc
description: sentiment analysis with hugging face transformer pipelines
dtnInputCorrelationId: None
dtnInputTransactionId: None
dtnOutputCorrelationId: None
dtnOutputTransactionId: None
dtnSystemId: None
dtnSystemInputDir: None
dtnSystemOutputDir: None
dynamicExecSystem: False
ended: None
execSystemConstraints: None
execSystemExecDir: /home/lccfuser/sharetest/workdir/

In [44]:
# Get job uuid from the job submission response
print("****************************************************")
job_uuid_vm=job_response_vm.uuid
print("Job UUID: " + job_uuid_vm)
print("****************************************************")

****************************************************
Job UUID: 22358a59-fa58-4f65-847e-e55b31364511-007
****************************************************


In [52]:
client.jobs.getJobStatus(jobUuid=job_uuid_vm)


condition: NORMAL_COMPLETION
status: FINISHED

In [53]:
client.jobs.getJob(jobUuid=job_uuid_vm)


_fileInputsSpec: None
_parameterSetModel: None
appId: lccfuser-sentiment-analysis-spadhy
appVersion: 0.1
archiveCorrelationId: None
archiveOnAppError: False
archiveSystemDir: /home/lccfuser/sharetest/workdir/jobs/22358a59-fa58-4f65-847e-e55b31364511-007/output
archiveSystemId: lccfuser-testvm-spadhy1
archiveTransactionId: None
blockedCount: 0
cmdPrefix: None
condition: NORMAL_COMPLETION
coresPerNode: 1
created: 2025-02-06T20:15:14.581676Z
createdby: spadhy
createdbyTenant: tacc
description: sentiment analysis with hugging face transformer pipelines
dtnInputCorrelationId: None
dtnInputTransactionId: None
dtnOutputCorrelationId: None
dtnOutputTransactionId: None
dtnSystemId: None
dtnSystemInputDir: None
dtnSystemOutputDir: None
dynamicExecSystem: False
ended: 2025-02-06T20:16:00.696462Z
execSystemConstraints: None
execSystemExecDir: /home/lccfuser/sharetest/workdir/jobs/22358a59-fa58-4f65-847e-e55b31364511-007
execSystemId: lccfuser-testvm-spadhy1
execSystemInputDir: /home/lccfuser/shar

In [54]:
client.jobs.getJobOutputDownload(jobUuid=job_uuid_vm, outputPath='results.csv')

b'SENTENCE,anger,disgust,fear,joy,neutral,sadness,surprise,anger,disgust,fear,joy,neutral,sadness,surprise\r\nThis is great,0.002,0.003,0.001,0.901,0.071,0.003,0.019\r\nThis is not fun,0.032,0.076,0.004,0.003,0.031,0.845,0.008\r\n'

### Create a system for the HPC cluster

With just a few changes to the system definition you can create a second system that can be used to run the
same application on an HPC type host. Note the minimal changes:

* **id** - A unique id is required
* **host** - Main hostname for the HPC system.
* **rootDir** - Using the root directory of the host gives us flexibility in setting **jobWorkingDir**.
  Note that you still need LINUX permissions.
* **jobWorkingDir** - Now determined dynamically using the Tapis v3 function HOST_EVAL()
* **jobRuntimes** - Most HPC systems support singularity and not docker
* **batchLogicalQueue.hpcQueueName** - HPC queue to use by default.
* **batchLogicalQueues** - HPC queue definitions for this HPC system.

In [66]:
user_id = "spadhy"
system_id_hpc = "lccf-hpc-" + user_id + str(1)
host="129.114.63.133"

# Create the system definition
exec_system_hpc = {
  "id": system_id_hpc,
  "description": "System for testing jobs on an HPC type host for LCCF Test",
  "systemType": "LINUX",
  "host": host,
  #"defaultAuthnMethod": "PASSWORD",
  "defaultAuthnMethod":"PKI_KEYS",  
  "effectiveUserId": "${apiUserId}",
  "rootDir": "/",
  "canExec": True,
  "jobRuntimes": [ { "runtimeType": "SINGULARITY" } ],
  "jobWorkingDir": "HOST_EVAL($HOME)/sharetest/workdir",
  "canRunBatch": True,
  "batchScheduler": "SLURM",
  "batchSchedulerProfile": "tacc",
  "batchDefaultLogicalQueue": "tapisNormal",
  "batchLogicalQueues": [
    {
      "name": "tapisNormal",
      "hpcQueueName": "normal",
      "maxJobs": 50,
      "maxJobsPerUser": 10,
      "minNodeCount": 1,
      "maxNodeCount": 16,
      "minCoresPerNode": 1,
      "maxCoresPerNode": 68,
      "minMemoryMB": 1,
      "maxMemoryMB": 16384,
      "minMinutes": 1,
      "maxMinutes": 60
    }
  ]
}

# Use the client to create the system in Tapis
print("****************************************************")
print("Create system: " + system_id_hpc)
print("****************************************************")
client.systems.createSystem(**exec_system_hpc)

# If you need to update the system,
# - modify the above definition as needed
# - comment out the above line
# - uncomment the below line
# - re-run the cell
#client.systems.patchSystem(**exec_system_hpc, systemId=system_id_hpc)


****************************************************
Create system: lccf-hpc-spadhy1
****************************************************



url: http://tacc.tapis.io/v3/systems/lccf-hpc-spadhy1

In [67]:
# List all systems available to you
print("****************************************************")
print("List all systems")
print("****************************************************")
#client.systems.getSystems()

****************************************************
List all systems
****************************************************


In [68]:
# Get details for the system you created
print("****************************************************")
print("Fetch system: " + system_id_hpc)
print("****************************************************")
client.systems.getSystem(systemId=system_id_hpc)

****************************************************
Fetch system: lccf-hpc-spadhy1
****************************************************



allowChildren: False
authnCredential: None
batchDefaultLogicalQueue: tapisNormal
batchLogicalQueues: [
hpcQueueName: normal
maxCoresPerNode: 68
maxJobs: 50
maxJobsPerUser: 10
maxMemoryMB: 16384
maxMinutes: 60
maxNodeCount: 16
minCoresPerNode: 1
minMemoryMB: 1
minMinutes: 1
minNodeCount: 1
name: tapisNormal]
batchScheduler: SLURM
batchSchedulerProfile: tacc
bucketName: None
canExec: True
canRunBatch: True
created: 2025-02-06T20:33:39.036412Z
defaultAuthnMethod: PKI_KEYS
deleted: False
description: System for testing jobs on an HPC type host for LCCF Test
dtnSystemId: None
effectiveUserId: spadhy
enableCmdPrefix: False
enabled: True
host: 129.114.63.133
id: lccf-hpc-spadhy1
importRefId: None
isDynamicEffectiveUser: True
isPublic: False
jobCapabilities: []
jobEnvVariables: []
jobMaxJobs: 2147483647
jobMaxJobsPerUser: 2147483647
jobRuntimes: [
runtimeType: SINGULARITY
version: None]
jobWorkingDir: HOST_EVAL($HOME)/sharetest/workdir
mpiCmd: None
notes: 

owner: spadhy
parentId: None
port: 

### Register Credentials for the HPC system

As before, now you will need to register credentials for your username. These will be used by Tapis to
access the host.

In [72]:
pubKey_hpc=""

In [73]:
privKey_hpc=""

In [74]:
# Register credentials
client.systems.createUserCredential(systemId=system_id_hpc, userName="spadhy", privateKey=privKey,publicKey=pubKey)

In [75]:
#password_hpc = password_vm
# Register credentials
#client.systems.createUserCredential(systemId=system_id_hpc, userName=username, password=password_hpc)


Now you can use the client to list files on the system. This will confirm that the credentials are valid.

In [76]:
# List files at the rootDir for the system
path_to_list = "/"
#client.files.listFiles(systemId=system_id_hpc, path=path_to_list)

## Application

In order to run a job on a system you will need to create a Tapis application.

### Create an application that can be run on the VM host or the HPC cluster

In [None]:
app_id_hpc = "lccf-sentiment-analysis-hpc-" + username
app_def_hpc= {
    "id": app_id_hpc,
    "version": "0.1",
    "description": "Application utilizing the sentiment analysis model from Hugging Face.",
    "jobType": "BATCH",
    "runtime": "SINGULARITY",
    "runtimeOptions": ["SINGULARITY_RUN"],
    "containerImage": "/tmp/sentiment-analysis_1.0.1.sif",
    "jobAttributes": {
            "parameterSet": {
            "archiveFilter": {
                "includeLaunchFiles": False
            }
        },
        "memoryMB": 1,
        "nodeCount": 1,
        "coresPerNode": 1,
        "maxMinutes": 10
    }
}

In [None]:
client.apps.createAppVersion(**app_def_hpc)

#client.apps.patchApp(**app_def_hpc, appId=app_id_hpc, appVersion='0.1')

In [None]:
# List all applications available to you
print("****************************************************")
print("List all applications")
print("****************************************************")
client.apps.getApps()

In [None]:
# Get details for the application you created
print("****************************************************")
print("Fetch application: " + 'Sentiment Analysis HPC app')
print("****************************************************")
client.apps.getAppLatestVersion(appId=app_id_hpc)

In [None]:
# Submit job to run the sentiment analysis application
pa= {
    "parameterSet": {
    "appArgs": [
            {"arg": "--sentences"},
            {"arg": "\"This is great\" \"This is not fun\""}
            
        ]
    }}

# Submit a job
job_response_hpc=client.jobs.submitJob(name='sentiment analysis',description='sentiment analysis with hugging face transformer pipelines',appId=app_id_hpc,appVersion='0.1',execSystemId=system_id_hpc, **pa)


### Get Job submission response


In [None]:
# Get Job submission response
print("****************************************************")
print("Job Submitted: " + app_id_hpc)
print("****************************************************")
print(job_response_hpc)

### Get Job UUID from the submission response


In [None]:
# Get job uuid from the job submission response
print("****************************************************")
job_uuid_hpc=job_response_hpc.uuid
print("Job UUID: " + job_uuid_hpc)
print("****************************************************")

### Check the status of the job


In [None]:
# Check the status of the job
print("****************************************************")
print(client.jobs.getJobStatus(jobUuid=job_uuid_hpc))
print("****************************************************")

### Download output of the job


In [None]:
# Once the job is in the FINISHED state, you can download output of the job
print("Job Output file:")

print("****************************************************")
jobs_output_hpc= client.jobs.getJobOutputDownload(jobUuid=job_uuid_hpc,outputPath='results.csv')
print(jobs_output_hpc)
print("****************************************************")

### Setting Notifications on Job events


Note: Make sure to add your email address in the submitJob call.

In [None]:
pa= {
    "parameterSet": {
    "appArgs": [
            {"arg": "--sentences"},
            {"arg": "\"This is great\" \"This is not fun\""}
            
        ]
    }}

# Submit a job
job_response_hpc_email=client.jobs.submitJob(name='sentiment analysis',description='sentiment analysis with hugging face transformer pipelines',appId=app_id_hpc,appVersion='0.1',execSystemId=system_id_hpc,subscriptions= [ { "description": "Test subscriptions", "eventCategoryFilter": "ALL","deliveryTargets": [ { "deliveryMethod": "EMAIL","deliveryAddress":"<Enter your email>"}] }],**pa)


In [None]:
# Get Job submission response
print("****************************************************")
print("Job Submitted: " + app_id_hpc)
print("****************************************************")
print(job_response_hpc_email)

In [None]:
# Get job uuid from the job submission response
print("****************************************************")
job_uuid_hpc_email=job_response_hpc_email.uuid
print("Job UUID: " + job_uuid_hpc_email)
print("****************************************************")

In [None]:
# Check the status of the job
print("****************************************************")
print(client.jobs.getJobStatus(jobUuid=job_uuid_hpc_email))
print("****************************************************")

### Cancel a job


In [None]:
# If necessary, you can cancel a long running job.
# To cancel a running job
# client.jobs.cancelJob(jobUuid=job_uuid_vm)

## Share System and App

In [None]:
#Making your execution system Public
client.systems.shareSystemPublic(systemId=system_id_hpc)

In [None]:
# Making the app public
client.apps.shareAppPublic(appId=app_id_hpc)

In [None]:
# Get Share info on the app
client.apps.getShareInfo(appId=app_id_hpc)
# Now any user in the tenant should be able to run your application

In [None]:
# Unsharing public app
#client.apps.unShareAppPublic(appId=app_id_hpc)

In [None]:
## You should now be able to run any public apps
'''
pa= {
    "parameterSet": {
    "appArgs": [
            {"arg": "--sentences"},
            {"arg": "\"This is great\" \"This is not fun\""}
            
        ]
    }}

# Submit a job
job_response_hpc_email=client.jobs.submitJob(name='sentiment analysis',description='sentiment analysis with hugging face transformer pipelines',appId=app_id_hpc,appVersion='0.1',execSystemId=system_id_hpc,subscriptions= [ { "description": "Test subscriptions", "eventCategoryFilter": "ALL","deliveryTargets": [ { "deliveryMethod": "EMAIL","deliveryAddress":"<Enter your email>"}] }],**pa)
'''