# Azure Data Factory integration
**Note**: This notebook runs on Python 3.9 and uses UbiOps CLient Library 3.15.0.

In this notebook we will show you how to create a pipeline that consists of 2 deployments: one that does preprocessing on the input data and the other that uses a KNN classifier to predict whether someone has diabetes.


If you run this entire notebook after filling in your access token, the pipeline will be deployed to your UbiOps environment. You can thus check your environment after running the notebook to explore. You can also check the individual steps in this notebook to see what we did exactly and how you can adapt it to your own use case.

We recommend to run the cells step by step, as some cells can take a few minutes to finish. You can run everything in one go as well and it will work, just allow a few minutes for building the deployment.

## Establishing a connection with your UbiOps environment
Add your API token. Then we will provide a project name, deployment name and deployment version name. Afterwards we initialize the client library. This way we can deploy the two pipelines to your environment.

In [None]:
API_TOKEN = '<INSERT API_TOKEN WITH PROJECT EDITOR RIGHTS>' # Make sure this is in the format "Token token-code"
PROJECT_NAME= '<INSERT PROJECT NAME IN YOUR ACCOUNT>'
# Import all necessary libraries
import ubiops
import requests

client = ubiops.ApiClient(ubiops.Configuration(api_key={'Authorization': API_TOKEN}, 
                                               host='https://api.ubiops.com/v2.1'))
api = ubiops.CoreApi(client)

## Initiate local repositories and download the deployment packages

In [None]:
prep_pack = requests.get('https://storage.googleapis.com/ubiops/data/Integration%20with%20cloud%20provider%20tools/azure-data-factory/preprocessing_package.zip')
pred_pack = requests.get('https://storage.googleapis.com/ubiops/data/Integration%20with%20cloud%20provider%20tools/azure-data-factory/predictor_package.zip')
funct = requests.get('https://storage.googleapis.com/ubiops/data/Integration%20with%20cloud%20provider%20tools/azure-data-factory/function.zip')

with open('preprocessing_package.zip', 'wb') as f:
  f.write(prep_pack.content)

with open('predictor_package.zip', 'wb') as f:
  f.write(pred_pack.content)

with open('functions.zip', 'wb') as f:
  f.write(funct.content)

## Create the preprocessing deployment

First of all, we will create the deployment that pre-processes the CSV files, before passing this input to the next deployment consisting of a KNN classifier.

The deployment has the following input:
- data: a csv file with the training data or with test data
- training: a boolean indicating whether we using the data for training or not. In the case this boolean is set to true the target outcome is split of of the training data.

The use of the boolean input "training" allows us to reuse this block later in a production pipeline. 

In [None]:
DEPLOYMENT_NAME = 'data-preprocessor'
DEPLOYMENT_VERSION = 'v1'

deployment_template = ubiops.DeploymentCreate(
    name=DEPLOYMENT_NAME,
    description='Pre-process incoming csv file',
    input_type='structured',
    output_type='structured',
    input_fields=[
        {'name':'data', 'data_type':'string'},
        {'name':'training', 'data_type':'bool'}
    ],
    output_fields=[
        {'name':'cleaned_data', 'data_type':'file'},
        {'name':'target_data', 'data_type':'file'}
    ],
    labels={'demo': 'azure-data-factory'}
)

api.deployments_create(
    project_name=PROJECT_NAME,
    data=deployment_template
)

# Create the version
version_template = ubiops.DeploymentVersionCreate(
    version=DEPLOYMENT_VERSION,
    environment='python3-9',
    instance_type='512mb',
    minimum_instances=0,
    maximum_instances=1,
    maximum_idle_time=1800, # = 30 minutes
    request_retention_mode='full', # input/output of requests will be stored
    request_retention_time=3600 # = 1 hour
)

api.deployment_versions_create(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    data=version_template
)

# Zip the deployment package
#shutil.make_archive('preprocessing_package', 'zip', '.', 'preprocessing_package')

# Upload the zipped deployment package
file_upload_result1 = api.revisions_file_upload(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION,
    file='preprocessing_package.zip'
)

## Preprocessing Deployment created and deployed!

Now that the preprocessing deployment has been successfully created and deployed, we can move to the next step.

In the next step, we will create the deployment that uses an already trained KNN classifier to predict whether someone has diabetes or not.

## Creating the KNN classifier deployment

In [None]:
deployment_template = ubiops.DeploymentCreate(
    name='knn-model',
    description='KNN model for diabetes prediction',
    input_type='structured',
    output_type='structured',
    input_fields=[
        {'name':'data', 'data_type':'file'},
        {'name':'data_cleaning_artefact', 'data_type':'file'}
    ],
    output_fields=[
        {'name':'prediction', 'data_type':'file'},
        {'name':'predicted_diabetes_instances', 'data_type':'int'}
    ],
    labels={'demo': 'azure-data-factory'}
)

api.deployments_create(
    project_name=PROJECT_NAME,
    data=deployment_template
)

# Create the version
version_template = ubiops.DeploymentVersionCreate(
    version=DEPLOYMENT_VERSION,
    environment='python3-9',
    instance_type='512mb',
    minimum_instances=0,
    maximum_instances=1,
    maximum_idle_time=1800, # = 30 minutes
    request_retention_mode='full', # input/output of requests will be stored
    request_retention_time=3600 # = 1 hour
)

api.deployment_versions_create(
    project_name=PROJECT_NAME,
    deployment_name='knn-model',
    data=version_template
)

# Zip the deployment package
#shutil.make_archive('predictor_package', 'zip', '.', 'predictor_package')

# Upload the zipped deployment package
file_upload_result2 = api.revisions_file_upload(
    project_name=PROJECT_NAME,
    deployment_name='knn-model',
    version=DEPLOYMENT_VERSION,
    file='predictor_package.zip'
)

## KNN classifier deployment successfully created and deployed!

In the next step, we will wait for the `knn-model` and the `preprocessor` to finish building.

In [None]:
ubiops.utils.wait_for_deployment_version(
    client=api.api_client,
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION,
    revision_id=file_upload_result1.revision
)

ubiops.utils.wait_for_deployment_version(
    client=api.api_client,
    project_name=PROJECT_NAME,
    deployment_name='knn-model',
    version=DEPLOYMENT_VERSION,
    revision_id=file_upload_result2.revision
)

# Deployments passed the building stage

If the output of the previous cell is "available" for each of the deployments, the deployments have been successfully created, deployed and built.

In the next step, we will create a pipeline which uses the preprocessing & KNN deployments that we have created in the previous cells of this notebook.

In [None]:
PIPELINE_NAME = "example-pipeline"

pipeline_template = ubiops.PipelineCreate(
    name=PIPELINE_NAME,
    description="A simple pipeline that cleans up data and let's a KNN model predict on it.",
    input_type='structured',
    input_fields=[
        {'name':'data', 'data_type':'string'},
        {'name':'training', 'data_type':'bool'}
    ],
    output_type='structured',
    output_fields=[
        {'name':'prediction', 'data_type':'file'},
        {'name':'predicted_diabetes_instances', 'data_type':'int'}
    ],
    labels={'demo': 'azure-data-factory'}
)

api.pipelines_create(project_name=PROJECT_NAME, data=pipeline_template)

PIPELINE_VERSION = DEPLOYMENT_VERSION

In [None]:
pipeline_template = ubiops.PipelineVersionCreate(
    version=PIPELINE_VERSION,
    request_retention_mode='full',
    objects=[
        # Preprocessor
        {
            'name': DEPLOYMENT_NAME,
            'reference_name': DEPLOYMENT_NAME,
            'version': DEPLOYMENT_VERSION
        },
        # KNN model
        {
            'name': 'knn-model',
            'reference_name': 'knn-model',
            'version': DEPLOYMENT_VERSION
        }
    ],
    attachments=[
        # start --> preprocessor
        {
            'destination_name': DEPLOYMENT_NAME,
            'sources': [{
                'source_name': 'pipeline_start',
                'mapping': [
                    {"source_field_name": 'data','destination_field_name': 'data'},
                    {"source_field_name": 'training','destination_field_name': 'training'}
                ]
            }]
        },
        # preprocessor --> KNN model
        {
            'destination_name': 'knn-model',
            'sources': [{
                'source_name': DEPLOYMENT_NAME,
                'mapping': [
                    {"source_field_name": 'cleaned_data','destination_field_name': 'data'},
                    {"source_field_name": 'target_data','destination_field_name': 'data_cleaning_artefact'}
                ]
            }]
        },
        # KNN model --> pipeline end
        {
            'destination_name': 'pipeline_end',
            'sources': [{
                'source_name': 'knn-model',
                'mapping': [
                    {"source_field_name": 'prediction','destination_field_name': 'prediction'},
                    {"source_field_name": 'predicted_diabetes_instances','destination_field_name': 'predicted_diabetes_instances'}
                ]
            }]
        }
    ]
)

api.pipeline_versions_create(project_name=PROJECT_NAME, pipeline_name=PIPELINE_NAME, data=pipeline_template)

## All done! Let's close the client properly.

In [None]:
client.close()

**IMPORTANT**: If you get an error like: "error":"Version is not available: The version is currently in the building stage"
Your deployment is not yet available and still building. 
Check in the UI if your deployment is ready and then rerun the block above.

# Pipeline successfuly created!

## Making a request and exploring further
You can go ahead to the Web App and take a look in the user interface at what you have just built. If you want you can create a request to the pipeline using data from the ["dummy_data_for_predicting.csv"](https://storage.googleapis.com/ubiops/data/Integration%20with%20cloud%20provider%20tools/azure-data-factory/dummy_data_for_predicting.csv) and setting the "training" input to "False". The dummy data is just the diabetes data without the Outcome column. 

So there we have it! We have made a pipeline in UbiOps that can be connected to Azure Data Factory. Be sure to run the rest of the steps of this tutorial as mentioned in the README, to explore how to set up the integration with Azure Data Factory. 

For any questions, feel free to reach out to us via the customer service portal: https://ubiops.atlassian.net/servicedesk/customer/portals