# Harmonized Data Access (HDA) API Tutorial

#### This Jupyter notebook demonstrates the use of Harmonized Data Access (HDA) API to query collection metadata and then submit a user defined query for data. The query is constructed based on the collection metadata. The end result is a list of URLs of the matching data
HDA-API version: 0.1.0

Dataset used for demonstration: EO:ESA:DAT:SENTINEL-2:MSI1C

The complete list of datasets that can be accessed by HDA API is given here: https://github.com/WEkEO/WEkEO-Datasets/blob/master/WEkEO-Collections-HDA.txt

It is recommended not to edit this notebook but rather create a copy of this notebook and test it.

# 1: Initialization

In [None]:
import requests, re, json, urllib3, sys
import shutil
import time
import urllib.parse
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

CONST_HTTP_SUCCESS_CODE = 200
# HDA-API endpoint
apis_endpoint="https://apis.wekeo.eu"

# Data broker address
broker_address = apis_endpoint + "/databroker/0.1.0"

# Terms and conditions
acceptTandC_address = apis_endpoint + "/dcsi-tac/0.1.0/termsaccepted/Copernicus_General_License"

# Access-token address
accessToken_address = apis_endpoint + '/token'

# We are going to use the Sentinel-2 Dataset
dataset_id = "EO:ESA:DAT:SENTINEL-2:MSI1C"  
# escape sequence for ":" is %3A. So encoded "EO:ESA:DAT:SENTINEL-2:MSI1C" is EO%3AESA%3ADAT%3ASENTINEL-2%3AMSI1C
encoded_dataset_id = urllib.parse.quote(dataset_id)

#The following is the default key which will be removed once the user gets the ability to generate the key via WEkEO portal  
api_key = "aTMzOHdPZUViZFQ0UmtBWnZ4Zjl1VV9XX1JjYTpmVzJSUW92d09NZHBXN3BDZzlCcjI1MFVMS3Nh"  

# 2: Get Access Token
Access token is generally valid for only one hour

In [None]:
headers = {
    'Authorization': 'Basic ' + api_key
}
data = [
  ('grant_type', 'client_credentials'),
]

print("Step-1: Getting an access token. This token is valid for one hour only.")
response = requests.post(accessToken_address, headers=headers, data=data, verify=False)

# If the HTTP response code is 200 (i.e. success), then retrive the token from the response
if (response.status_code == CONST_HTTP_SUCCESS_CODE):
    access_token = json.loads(response.text)['access_token']
    print("Success: Access token is " + access_token)
else:
    print("Error: Unexpected response {}".format(response))
    print(response.headers)    

# 3: Get "Query Metadata"

In [None]:
headers = {
    'Authorization': 'Bearer ' + access_token,
}

response = requests.get(broker_address + '/querymetadata/' + encoded_dataset_id, headers=headers)

print('Step 2: Getting query metadata, URL Is ' + broker_address + '/querymetadata/' + encoded_dataset_id +"?access_token="+access_token)

print("************** Query Metadata for " + dataset_id +" **************")

if (response.status_code == CONST_HTTP_SUCCESS_CODE):
    parsedResponse = json.loads(response.text)
    print(json.dumps(parsedResponse, indent=4, sort_keys=True))
    print("**************************************************************************")
else:
    print("Error: Unexpected response {}".format(response))

# 4: Accept Terms and Conditions

In [None]:
#  Accept Terms and Conditions for the dataset (if not already)
response = requests.get(acceptTandC_address, headers=headers)

isTandCAccepted = json.loads(response.text)['accepted']
if isTandCAccepted is 'False':
    print("Accepting Terms and Conditions of Copernicus_General_License")
    response = requests.put(acceptTandC_address, headers=headers)
else:
    print("Copernicus_General_License Terms and Conditions already accepted")

# 5. Send Query for Products
Successful submission of a query for products returns a response with job-id, job-status.

Example response:
```json
{
   "jobId":"341552be-7ce4-470d-8c32-7e6a31c836f0",
   "complete":false,
   "status":"STARTED",
   "message":null,
   "resultNumber":null,
   "created":"2018-11-21T15:11:38.208"
}
```

In [None]:
#Example query for Sentinel-2 data. This query is constructed based on the response of the metadata query 
data = {
    "datasetId": dataset_id,
    "stringChoiceValues": [
        {
            "name": "collection",
            "value": "MSI_L1C"
        }
    ],
    "dateRangeSelectValues": [
        {
            "name": "dtrange",
            "start": "2018-08-19T14:06:23.646Z",
            "end": "2018-11-19T14:06:23.646Z"
        }
    ],
    "boundingBoxValues": [
        {
            "name": "bbox",
            "bbox": [
                -180,
                -90,
                180,
                90
            ]
        }
    ]
}

response = requests.post(broker_address + '/datarequest', headers=headers, json=data, verify=False)

if (response.status_code == CONST_HTTP_SUCCESS_CODE):    
    job_id = json.loads(response.text)['jobId']
    print ("Query successfully submitted. Job ID is " + job_id)
else:
    print("Error: Unexpected response {}".format(response))

# 6. Check job status
Example response of a successfully completed job/query:
```json
{
   "jobId":"341552be-7ce4-470d-8c32-7e6a31c836f0",
   "complete":true,
   "status":"COMPLETED",
   "message":null,
   "resultNumber":729327,
   "created":"2018-11-21T15:11:38.208"
}
```

In [None]:
isComplete = False
while not isComplete:
    response = requests.get(broker_address + '/datarequest/status/' + job_id, headers=headers)
    results = json.loads(response.text)['resultNumber']
    isComplete = json.loads(response.text)['complete']
    print("Has the Job " + job_id + " completed ?  " + str(isComplete))
    # sleep for 2 seconds before checking the job status again
    if not isComplete:
        time.sleep(2)

numberOfResults = str(results)
print ("Total number of products/results :" + numberOfResults)

# 7. Get Results List
The query results are paginated. Parameters for page number and the number of results per page can be used to fetch only the necessary results. The page number are numbered from 0 (i.e. first page is numbered 0).  

In the example below, each page contains 5 results and we are going to show the results from the 3<sup>rd</sup> page (zero based numbering) 

In [None]:
params = {'page':'2', 'size':'5'}
response = requests.get(broker_address + '/datarequest/jobs/' + job_id + '/result', headers=headers, params = params)
results = json.loads(response.text)

print("************** Results  *******************************")
print(json.dumps(results, indent=4, sort_keys=True))
print("*********************************************")

# 8. Get Results Download Link

In [None]:
for result in results['content']:
    externalUri = result['externalUri']
    product_size = result['fileSize']/(1024*1024)
    product_name = result['fileName']
    download_url = broker_address + '/datarequest/result/' + job_id + '?externalUri=' + urllib.parse.quote(externalUri) +"&access_token="+access_token
    print("Download link for " + product_name + "(" + "{:.2f}".format(product_size) + " MB) :")
    print(download_url)
    print("")