# Working with processes and jobs
Unity helps teams move quickly from algorithm development and testing in Jupyter into large-scale processing with the Scaled Processing System (SPS). In order to do this, algorithms developed in Jupyter must be: 
1. Registered in the Application Catalog (see 1_working_with_applications.ipynb) as _Applications_, 
2. Deployed to an Application Deployment and Execution Service (ADES) in SPS where they are referred to as _Processes_, and then may be 
3. Run on the SPS as _Jobs_.

The intent of this tutorial is to help familiarize yourself with execution of code at scale using _Processes_ and _Jobs_. Unity provides users the ability to execute _Jobs_ to produce data. The steps below will showcase how a job is typically submitted to the Unity Platform.

## 0. Set up imports, predefined variables, and authentication

For this Tutorial we will make use of the Unity-Py client Python package.

### Imports

In [None]:
import requests
import time

from datetime import datetime
from IPython.display import JSON

from unity_py.unity import Unity
from unity_py.unity_services import UnityServices
from unity_py.resources.job_status import JobStatus

### Predefined Variables

In [None]:
unity = Unity()
process_service = unity.client(UnityServices.PROCESS_SERVICE)

### Print Unity Configuration

In [None]:
print(unity)

## 1. Query Dockstore

In [None]:
r = requests.get("http://uads-test-dockstore-deploy-lb-1762603872.us-west-2.elb.amazonaws.com:9998/api/workflows/published")
r.raise_for_status()
json = r.json()

print("\n\nList of Application Packages available in Dockstore:")
JSON(json)

## 2. Select an application from Dockstore

In the previous step we queried Dockstore and retrieved a list of applications. From this list of applications, select one and set the `workflow_id` and `workflow_version_id` in the cell below.

`workflow_id` is a field labeled `id` from Dockstore.
`workflow_version_id` is a field labeled `id` in the `workflowVersions` object.

The ID values set below reference an example Sounder SIPS L1B application registered in Unity's Dockstore Test environment.

In [None]:
workflow_id = 16
workflow_version_id = 31

## 3. Fetch the application's metadata

The code below will download a ZIP file containing the CWL files associated with the selected package. The name of the zip will the name of the application (aka workflow). The information in the ZIP file will be used to create a JSON payload needed to deploy the application to Unity's Science Processing Service.

***Note, this ZIP file contains a file named `.dockstore.yml` which when unpackad will not be visible in JupyterLab's Folder/File View, it will be viewable via a Terminal window.***

In [None]:
r_zip = requests.get("http://uads-test-dockstore-deploy-lb-1762603872.us-west-2.elb.amazonaws.com:9998/api/workflows/{workflowId}/zip/{workflowVersionId}".format(workflowId=16,workflowVersionId=31))
r_zip.raise_for_status()

workflow_name = json[0]['workflowName']
open("{}.zip".format(workflow_name), 'wb').write(r_zip.content)

## 4. Deploy application

Now that we have collected the information about the application from Dockstore, we can package it up and prep it for deployment to Unity's Science Processing Service.

Coming soon...

## 5. Listing all deployed processes

The Unity-Py client provides the ability to view all deployed application packages (a.k.a., Processes) on the system using the `get_processes` function. After a successful deployment of an application package to SPS, you should see a new entry for the deployed application.

The `id` property is one of the properties needed to execute the Process and see existing Jobs for a given Process.

In [None]:
processes = process_service.get_processes()
for process in processes:
    print("Process ID: {}".format(process.id))
    print("Process Title: {}".format(process.title))
    print("Process Abstract: {}".format(process.abstract))
    print("Process Version: {}".format(process.process_version))
    print("")
    print(process)
    
# For example purposes, we will use the first process returned
my_process = processes[0]
print("\n\nSelected Process:\n\n{}".format(my_process))

### Retrieve a deployed process's information

If you know the ID of the process, the process information can be retrieved using the `get_process(id)` method of `ProcessService`.


In [None]:
process = process_service.get_process('l1b-cwl:develop')
print(process)

## 6. Execute a job
Before deploying Applications and working with jobs, it is assumed that a system administrator has deployed an ADES. These are often called "venues" or "environments", and some examples may be dev, test, prod, etc. To run a Job, you need a Process available as well (a deployed Application). In this case we have a demo application deployed and referenced in the setup step 0 above.

With an ADES and a Process ready to accept our Job, we can submit a Job along with any input data or parameters that are needed (as defined by a template job definition for a particular Application). In this example case, none are needed so `inputs` is blank. The response will provide a Job ID that we will store in a variable called `job_id` for use later.

***NOTE*** - the sample application does not provide input parameters or output parameters at this time. These are coming in a future version of the job control endpoint. Future jobs will allow:
- Explicit inputs to be used (no magic)
  - inputs can be Unity Resources or DAAC resources
- Explicit output 'collection' to which the data will be written

In [None]:
data = {
  "mode": "async",
  "response": "document",
  "inputs": [
  ],
  "outputs": [
    {
      "id": "output",
      "transmissionMode": "reference"
    }
  ]
}

try:

    # Store Job ID to use in future steps
    job = my_process.execute(data)
    print(job)

    # If the job submission is successful, print a success message along with the returned JOB-ID
    print("\nJob Submission Successful!\nJOB ID: {}\n".format(job.id))

except requests.exceptions.HTTPError as e:
    # An error has occurred, print the error message that was generated
    print(e.reason)

## 7. Get the job status
The code below will demonstrate how one can check the status of the Job. The potential status responses are documented [here] _need a reference to process status_.

In this example, we will look up the status of the predfined application name from the setup step 0, and the job ID that was returned previously. This function will loop/block until the process ends. You will see a printout every 5 seconds.

In [None]:
try:

    job_status = job.get_status()
    
    while job_status == JobStatus.RUNNING:
        print("Status for job \"{}\" ({}): {}".format(job.id, datetime.now().strftime("%H:%M:%S"), job_status.value))
        time.sleep(5)
        job_status = job.get_status()
    
    # Print the job status
    print("\nStatus for job \"{}\" ({}): {}\n".format(job.id, datetime.now().strftime("%H:%M:%S"),job_status.value))
    
except requests.exceptions.HTTPError as e:
    # An error has occurred, print the error message that was generated
    print(e.reason)


## 8. Get job results
Now that we've monitored the status of a Job, and verified that is has completed, we can retreive information about where the generated data is located. 

In this example, we will use the predefined Process name, and the Job ID that was returned previously.

In [None]:
try:

    job_data = job.get_result()

except requests.exceptions.HTTPError as e:
    print(e.reason)

print("\nFull JSON output data object:")
JSON(job_data)

## So... where are the results?

Currently the results are not made available to the endpoint, but a future release will include a STAC document of generated files and their locations. For now, the results are written to the `SNDR_SNPP_ATMS_L1B_OUTPUT___1` product type in the Unity system. Again, future releases will allow the specification of the output collection (by name or version).

## 9. Dismiss a job

What if after a job was started, it was determined that it is no longer needed? The job can be dismissed as shown in the example below so long as it is still running.

In [None]:
print("{},{}".format(my_process.id, job.id))
response = job.dismiss()

print("Response: ", response, "\n")

## 10. List all jobs for a given process

What if I restarted my notebook and I have no Job ID? The API can ask the process endpoint to list the Jobs for a given Process (deployed Application).


In [None]:
print("Jobs running for Process: ", my_process.id, "\n")

jobs = process_service.get_jobs(my_process)

for job in jobs:
    print(job)