<img src='https://gitlab.eumetsat.int/eumetlab/oceans/ocean-training/tools/frameworks/-/raw/main/img/Standard_banner.png' align='right' width='100%'/>

<font color="#138D75">**WEkEO Training**</font> <br>
**Copyright:** 2024 EUMETSAT <br>
**License:** MIT <br>
**Authors:** Anna-Lena Erdmann (EUMETSAT)

<html>
  <div style="width:100%">
    <div style="float:left"><a href="https://jupyterhub.prod.wekeo2.eu/hub/user-redirect/lab/tree/public/wekeo4data/wekeo-eocanvas/03_EOCanvas_Using_S3_buckets.ipynb"><img src="https://img.shields.io/badge/launch-WEKEO-1a4696.svg?style=flat&logo=" alt="Open in WEkEO"></a></div>
    <div style="float:left"><p>&emsp;</p></div>
  </div>    
</html>

<div class="alert alert-block alert-success">
<h3> WEkEO EOCanvas - Processing in the Cloud for Coperncius Data</h3></div>

<div class="alert alert-block alert-warning">
    
<b>PREREQUISITES </b>
    
This notebook has the following prerequisites:
  - **<a href="https://my.wekeo.eu/user-registration" target="_blank">A WEkEO account</a>**
  - access and secret key to a private S3 bucket (e.g. from a WEkEO tenant, an AWS S3 bucer, etc.)

  

</div>
<hr>

# 4 Using S3 remote object storage buckets within the EOCanvas

### Learning outcomes

At the end of this notebook you will know;

* how to use data located on an private or public S3 bucket as input to the EOcanvas functions
* how to send processing results to a S3 bucket



### Outline

The EOCanvas is a WEkEO service to process Coperncius data in the cloud. The inout to the EOCanvas functions can either come from WEkEOs <a href='https://www.wekeo.eu/docs/harmonised-data-access-api' target='_blank'>Harmonised Data Access (HDA) API</a> or a remote object storage. This notebook provides an example on how to use data in S3 buckets as input data for the EOCanvas functions. It will also show how to send results to a S3 bucket to enable processing without the need to download data to your machine. 

<div class="alert alert-info" role="alert">

### Contents <a id='totop'></a>

</div>
    
 1. [Setting Up the EOCanvas](#section0)
 2. [Connect to a S3 bucket](#section1)
 3. [Define the EOCanvas input from S3 bucket](#section2)
 4. [Execute the EOCanvas Function](#section3)
 5. [Send the Results to an S3 bucket](#section4)

<hr>

<div class="alert alert-info" role="alert">

## 1. <a id='section0'></a>Setting Up the EOCanvas
[Back to top](#totop)
    
</div>

This example notebook shows you how to use a SNAP function of the EOCanvas using input data from a public S3 bucket. 

Loadinng necessary libraries

In [1]:
from eocanvas import API, Credentials
from eocanvas.api import Input, Config, ConfigOption
from eocanvas.processes import SnapProcess
from eocanvas.snap.graph import Graph

You must replace `<your_user_name>` and `<your_password>` with the information from your WEkEO account (if you don't have one yet, register <a href="https://www.wekeo.eu/" target="_blank">here</a>.

Save your credentials. They will be automatically loaded when required.

In [2]:
c = Credentials(username="<your_user_name>", password="<your_password>")
c.save()

Credentials are written to file C:\Users\erdmann\.hdarc


<div class="alert alert-info" role="alert">

## 2. <a id='section1'></a>Connect to a S3 bucket
[Back to top](#totop)
    
</div>



In this section, we explore how to set up and configure the necessary keys to enable seamless access to S3 buckets using the **EOcanvas** library. We will cover two key aspects:

1. **Accessing Data from an S3 Bucket**: Learn how to set up credentials that allow the EOcanvas functions to find and read data stored in an S3 bucket.
   
2. **Writing Results to a Private S3 Bucket**: Discover how to configure a dedicated key for accessing private buckets, enabling you to write and store results securely.

> **Important Note**: EOcanvas currently relies on **OpenSSL** for encryption, so it is essential to have OpenSSL installed on your machine. If you are working within the **WEKEO Jupyter Hub**, OpenSSL is already preinstalled, ensuring a smooth setup process.

Let’s dive into the details to get started!

### 2.1 Connect to a public S3 bucket

A key has to be created to access a public bucket for data input. Necessary inputs are `bucket`, `region`, and `endpoint`. It is necessary to pass `access_key` and `private_key` to the key, but it should be left empty, as a public bucket does not require access credentials. 

The `Key()` name has to be unique in the system. You only have to define the key one and it will be valid for one hour. If you want to continue working with the key longer than one hour, either execure the cell again, or pass the argument `"expire"` with the duration (in seconds) the key should be valid vor. 

In [None]:
from eocanvas.api import Key, WebDavKeyConfig, S3KeyConfig

# Set all the required parameters to configure a specific key
config = S3KeyConfig(
    access_key="",
    secret_key="",
    bucket="wekeo",
    region="waw3-2",
    endpoint="https://s3.waw3-2.cloudferro.com",
)

# Note that the name must be unique. You might want to prefix your username.
key = Key(name="input-s3-key", config=config)

# Calling 'create' will download the public key from EOCanvas, encrypt the configuration and
# send it to the API.
key.create()

### 2.2 Connect to a private S3 bucket


Connecting to a private bucket follows the same template as for the public bucket. Make sure that you pass `access_key` and `private_key`. 

In [3]:
# Set all the required parameters to configure a specific key
config = S3KeyConfig(
    access_key="*************************",
    secret_key="**************************",
    bucket="wekeo_private",
    region="waw3-2",
    endpoint="https://s3.waw3-2.cloudferro.com",
)

# Note that the name must be unique. You might want to prefix your username.
key = Key(name="output-s3-key", config=config)

# Calling 'create' will download the public key from EOCanvas, encrypt the configuration and
# send it to the API.
key.create()

<div class="alert alert-info" role="alert">

## 3. <a id='section2'></a>Define the EOCanvas input from S3 bucket
[Back to top](#totop)
    
</div>

First list all the products in the S3 bucket to see which product are availbale for the function:

In [7]:
import boto3
import botocore
# Define bucket, region, and endpoint
bucket_name = "wekeo"
region = "waw3-2"  # Replace with the bucket's region
endpoint = "https://s3.waw3-2.cloudferro.com"  # Replace if using a non-default endpoint


# Create the S3 client with custom settings
s3_client = boto3.client("s3", region_name=region, endpoint_url=endpoint)
s3_client.meta.events.register('choose-signer.s3.*', botocore.handlers.disable_signing)
# List objects in the public bucket
try:
    response = s3_client.list_objects_v2(Bucket=bucket_name)
    if 'Contents' in response:
        print(f"Objects in bucket '{bucket_name}':")
        for obj in response['Contents']:
            print(obj['Key'])
    else:
        print(f"No objects found in bucket '{bucket_name}'.")
except Exception as e:
    print(f"Error accessing bucket: {e}")

Objects in bucket 'wekeo':
S3B_OL_2_WFR____20240705T092739_20240705T093039_20240706T161845_0180_095_036_1980_MAR_O_NT_003.SEN3.zip
S3B_OL_2_WFR____20240705T092739_20240705T093039_20240706T161845_0180_095_036_1980_MAR_O_NT_003.SEN3/
S3B_OL_2_WFR____20240705T092739_20240705T093039_20240706T161845_0180_095_036_1980_MAR_O_NT_003.SEN3/Oa01_reflectance.nc
S3B_OL_2_WFR____20240705T092739_20240705T093039_20240706T161845_0180_095_036_1980_MAR_O_NT_003.SEN3/Oa02_reflectance.nc
S3B_OL_2_WFR____20240705T092739_20240705T093039_20240706T161845_0180_095_036_1980_MAR_O_NT_003.SEN3/Oa03_reflectance.nc
S3B_OL_2_WFR____20240705T092739_20240705T093039_20240706T161845_0180_095_036_1980_MAR_O_NT_003.SEN3/Oa04_reflectance.nc
S3B_OL_2_WFR____20240705T092739_20240705T093039_20240706T161845_0180_095_036_1980_MAR_O_NT_003.SEN3/Oa05_reflectance.nc
S3B_OL_2_WFR____20240705T092739_20240705T093039_20240706T161845_0180_095_036_1980_MAR_O_NT_003.SEN3/Oa06_reflectance.nc
S3B_OL_2_WFR____20240705T092739_20240705T093039_

You can see tht there is one Sentinel-3 satellite tile stored in the bucket - once in a zipped format ond once unzipped. The SNAP function takes the zipped tile as input. 

We prepare the inputs to the EOCanvas by taking the same workflow as described in the notebook <a href="https://github.com/wekeo/wekeo4data/blob/main/wekeo-eocanvas/01_Introduction_to_EOCanvas.ipynb" target='_blank'>01_Introduction_to_EOCanvas</a>. 

In [4]:
graph = Graph.from_uri("input_graphs/subset_olci.xml")

When we define the input image, we take the path the the file in the s3 bucket. We add the parameter `keystore` and set the name of the key we defined above. 

In [5]:
inputs = Input(keystore="input-s3-key", key="img1", url="/S3B_OL_2_WFR____20240705T092739_20240705T093039_20240706T161845_0180_095_036_1980_MAR_O_NT_003.SEN3.zip")

Finally, the config parameters, which are unchanged. 

In [6]:
config = Config(key="img1", options=ConfigOption(uncompress=True, sub_path="xfdumanifest.xml"))

<div class="alert alert-info" role="alert">

## 3. <a id='section3'></a>Execute the EOCanvas Function
[Back to top](#totop)
    
</div>

Here, we put together the function and execute it. We can download the result to or file directory. 

In [7]:
process = SnapProcess(snap_graph=graph, eo_config=config, eo_input=inputs)

Run the process and save the results to the defined working directory

In [7]:
process.run(download_dir="result")

Job: 8ed2fabf-bbc2-5974-a7f9-cf0d581a04ae - Status: accepted at 2024-10-21T10:59:05.778442
Job: 8ed2fabf-bbc2-5974-a7f9-cf0d581a04ae - Status: running at 2024-10-21T10:59:16.291469
Job: 8ed2fabf-bbc2-5974-a7f9-cf0d581a04ae - Status: running at 2024-10-21T10:59:27.825847
Job: 8ed2fabf-bbc2-5974-a7f9-cf0d581a04ae - Status: running at 2024-10-21T10:59:40.743590
Job: 8ed2fabf-bbc2-5974-a7f9-cf0d581a04ae - Status: running at 2024-10-21T10:59:54.600702
Job: 8ed2fabf-bbc2-5974-a7f9-cf0d581a04ae - Status: running at 2024-10-21T11:00:09.794172
Job: 8ed2fabf-bbc2-5974-a7f9-cf0d581a04ae - Status: running at 2024-10-21T11:00:26.498304
Job: 8ed2fabf-bbc2-5974-a7f9-cf0d581a04ae - Status: running at 2024-10-21T11:00:45.019567
Job: 8ed2fabf-bbc2-5974-a7f9-cf0d581a04ae - Status: running at 2024-10-21T11:01:06.315133
Job: 8ed2fabf-bbc2-5974-a7f9-cf0d581a04ae - Status: running at 2024-10-21T11:01:29.908011
Job: 8ed2fabf-bbc2-5974-a7f9-cf0d581a04ae - Status: running at 2024-10-21T11:01:54.153116
Downloadi

<div class="alert alert-info" role="alert">

## 5. <a id='section4'></a>Send the Results to an S3 bucket
[Back to top](#totop)
    
</div>

You have the option to send results to an S3 bucket. It is necessary that you have set up the Key to the bucket, that includes the necessaty credentials to write to the bucket. 

The option `"output"` can be added when you put together the process. 

In [9]:
process = SnapProcess(snap_graph=graph, eo_config=config, eo_input=inputs, output=Key(name="output-wekeo-key"))

In [None]:
process.run()