# Submitting an Extractor Job
This Jupyter Notebook tutorial uses Python to show the steps needed get your data processed by an extractor. In this tutorial we will be using the OpenDroneMap extractor. The same process can be used for any extractor although some extractor-specific details may vary.

---
### Contents
- [Overview](#overview)
- [Audience](#audience)
- [What to expect](#expect)
- [Prerequisites](#prerequisites)
- [Cautions](#cautions)
- [Step 1 - Python Imports and Setup](#step1)
- [Step 2 - Specify the Experiment](#step2)
- [Step 3 - Required Request Parameters](#step3)
- [Step 4 - Optional Request Parameters](#step4)
- [Step 5 - Making the Request](#step5)
- [Completed](#completed)
- [Feedback](#feedback)
- [References](#references)
- [Acknowledgements](#acknowledgements)
---

## Overview
This tutorial covers how to use Python within a dockerized Jupyter notebook to send a request to Clowder to start processing a set of previously loaded drone data.
Completing this tutorial will provide the background for submitting other extractor requests and determining if the requests are successful.

## Audience
This notebook is for people that want to learn how to process drone data using the Clowder-based Drone Pipeline.

It's helpful, but not necessary, to be familiar with Jupyter Notebooks and, perhaps, have some experience with Python.

## What to expect <a name="expect"></a>
We will be using a Python library to do most of the work for us.
Each step of this tutorial contains text describing what needs to be done and then presents code that performs those actions.

In the code cells below, we will be loading the pipelineutils library, defining variables that provide information about the experiment and our Clowder credentials, and then making the request to start the extractor using the variables we defined.

You will need to modify the code cells to match your actual data (sample data will work as well).

## Prerequisites <a name="prerequisites"></a>
To successfully complete this tutorial you will need to have an existing Clowder account and have data loaded into a dataset.
Additionally, the Python `pipelineutils` library will need to have been installed on the Jupyter Notebook instance this tutorial is running on.

>Perform the following steps to install the `pipelineutils` library:
>1. click the "New Launcher" icon and select a terminal
>2. In the terminal window execute the following command <em>'pip install pipelineutils'</em> to install the library
>
>If you are having trouble installing, try adding a version number to the install request. Assuming the latest version is 1.0.4, your command would look like <em>'pip install pipelineutils==1.0.4'</em>

You can create a Clowder account at the [Drone Processing Pipeline](https://dronepipeline.cyverse.org/) instance of Clowder. Once you have your account, create a dataset and load a flight's worth of data into the dataset.

## Cautions <a name="cautions"></a>
There are two main files in the Clowder dataset to be processed that, if they are in the dataset, will be overwritten.
These files are the *experiment.yaml* file and the *extractors-opendronemap.txt* file.
If you have placed these files in the dataset this tutorial will process, you should download them to preserve them.

---
## Step 1 - Python Imports and Setup <a name="step1"></a>
The first step is to let Python know which libraries you will be needing for your commands.

We are also going to define the Clowder URL so the calls we make know which instance to access.
You will need to replace the endpoint with the URL of your Clowder instance.

In [None]:
# Importing the libraries we will need
import pipelineutils.pipelineutils as dpu

clowder_url="https://dronepipeline.cyverse.org"    # Replace this value with your Clowder URL

---
## Step 2 - Specify your Experiment <a name="step2"></a>
There are several pieces of information needed by the extractor for its processing.
We are focused on the OpenDroneMap extractor in this tutorial and are providing the information that it needs.
Other extractors have different requirements which can be found with their documentation.

The timestamp needed is an ISO 8601 timestamp, formatted as a complete date with hours, minutes, and seconds: `YYYY-MM-DDThh:mm:ssTZD`.

Each of the angle bracket values that are shown below, and the text within them, need to be replaced with your values.
For example, if your study name is "Height 2019", you would replace "&lt;study name&gt;" with "Height 2019".

In [None]:
# Provide experiment information for the extractor
experiment = dpu.prepare_experiment("<study name>",   # Replace <study name> with your study name
                                    "<season name>",  # Replace <season name> with your season name
                                    "<timestamp>"     # Replace <timestamp> with your timestamp
                                   )

# Display what we have
print(experiment)

Assuming a study name of "Height 2019", a season of "Season 3", and a data capture timestamp of "2019-05-31T14:20:40-08:00", you would have the following as your experiment data after making the call:
```python
experiment = {
    "studyName": "Height 2019",
    "season": "Season 3",
    "observationTimeStamp": "2019-05-31T14:20:40-08:00"
}
```

---
## Step 3 - Required Request Parameters <a name="step3"></a>
We have encountered two of the call parameters above when we configured the Clowder URL and the experiment.

### What they are
The additional required parameters are your Clowder credentials, the dataset name, the name of a space in Clowder, and the extractor name.
- username and password: these are your Clowder login credentials
- dataset: the name of the loaded drone data to process
- extractor: the shorthand name of the extractor
- space name: location where the results of processing will be organized

### Why they're needed
The credentials are used to access Clowder on your behalf; the dataset name is used to identify where the data resides that should be processed; a space name is where resulting data is organized in Clowder; the extractor name identifies which extractor we'll be running.

In [None]:
# Specify required parameters
username="email@address"         # The Clowder username portion of credentials
password="password"              # The password associated with the Clowder username
dataset="my dataset"             # The dataset to associate with the extractor request
extractor="opendronemap"         # The extractor to run. Note that this is not the full Clowder name
space_name="Processed"           # The space name for processed data organization

---
## Step 4 - Optional Request Parameters <a name="step4"></a>
In addition to the required parameters described above, there are other parameters that could be specified when we make the call.

### What they are
The `space_must_exist` optional parameter has three values: *None*, *False*, and *True*.
The default value for this parameter is `None` indicating that an attempt will be made to create the space in Clowder if it doesn't already exist.
If the value for this parameter is changed to `True`, the space must already exist in Clowder when the call is made or an error will be returned.
If the value for this parameter is `False`, then the space must *not* exist when the call is made or an error is returned. 
If `False` is specified and the space does not exist, it's created before the extractor is run.

The `config_file` optional parameter defaults to `None` indicating that there isn't a configuration file specified. 
This parameter can be overridden with the path to a configuration file or a with a configuration string. 
In our case we will use an empty string as our OpenDroneMap configuration override - indicating we will accept the default configuration.
Refer to the [extractors-opendronemap.txt.sample](https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-opendronemap/browse/extractors-opendronemap.txt.sample?at=refs%2Fheads%2Fupdate_odm_extractor) file in BitBucket for more information on the contents of the OpenDroneMap extractor configuration overrides.

The `api_key` optional parameter is used when a specific key is to be used when making calls to clowder.
The default behavior by the library is to fetch a key associated with the username and password and then used to make calls.
Specifying a key will override this behavior.
The default value for this parameter is `None`.

In [None]:
# Defining optional parameters
space_must_exist=None           # The variable name does not need to be the same as the parameter name
config_file=""                  # We are using a string to indicate acceptance of the default configuration
api_key=None                    # The Clowder API key to use when making requests

---
## Step 5 - Making the Request <a name="step5"></a>
We are now ready to make the call to schedule the OpenDroneMap extractor. 
In our example below we will only be using the required parameters, but you are free to experiment with using the optional parameters.

In [None]:
# Make the call
res = dpu.start_extractor(clowder_url,      # The URL of Clowder instance
                          experiment,       # Experiment configuration
                          username,         # The username portion of Clowder credentials
                          password,         # The password associated with the username
                          dataset,          # The dataset to associate with the extractor
                          extractor,        # Name of the extractor to schedule
                          space_name,       # Name of the target space
                          config_file=config_file # The configuration to submit the job with
                         )

# Check the result for a problem
if res == False:
    raise RuntimeError
    
# Everything is OK
print("Extractor request submitted")

---
## Completed <a name="completed"></a>
Congratulations! You have successfully submitted a request to process data.

At this time you should be able to submit OpenDroneMap jobs on other drone flights.
Additionally, you can take this approach and start other extractor jobs.
Finally, you can customize the order in which extractors are run to produce you own custom workflow.

We invite you to take a look at the other tutorials we have available.

## Feedback <a name="feedback"></a>
We always enjoy hearing how much people like our tutorials, or how to improve them.
If you would like suggest something new or have something changed, please [record an issue](https://github.com/terraref/computing-pipeline/issues).

## References <a name="references"></a>
The main site for the TERRA REF project, including the Drone Pipeline, is on [GitHub](https://github.com/terraref).

Non-technical documentation for the Drone Pipeline is on [OSF](https://osf.io/xdkcy/).

## Acknowledgements <a name="acknowledgements"></a>
This tutorial was written by Christophe Schnaufer, University of Arizona, Tuscon AZ