# Lab 3 - Build an run a fully integrated workflow for our QC-AFQMC experiment

In the previous labs we have learned how to perform a classical and AFQMC simulation using AWS Batch, and a quantum AFQMC simulation with Amazon Braket and AWS Batch. For each case, we executed the indivudal steps in the workflow manually in the notebook. For the QC-AFQMC run, we first generated matchgate shadows using quantum computing resources in a hybrid job on Amazon Braket, we then performed a classical post-processing of the collected shadows on AWS Batch, and we finally collected the results of the parallel child jobs locally to reduce the data for comparison with the classical AFQMC run from the first lab.

In this lab, we build a fully integrated workflow to execute the entire QC-AFQMC pipeline in one consistent and reproducible run, with the individual steps orchestrated automatically. For that purpose, we leverage another service, __[AWS Step Functions](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html)__. 

With AWS Step Functions, you can create workflows, also called *state machines*, to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning pipelines. Particularly interesting for our use case is the [integration of Step Functions with Batch](https://docs.aws.amazon.com/step-functions/latest/dg/connect-batch.html).

In [None]:
import boto3

cfn_client = boto3.client('cloudformation')
batch_client = boto3.client('batch')
sts_client = boto3.client("sts")
s3_client = boto3.client("s3")
sfn_client = boto3.client('stepfunctions')

ACCOUNT_ID = sts_client.get_caller_identity().get("Account")
print("Account ID:", ACCOUNT_ID)

my_session = boto3.session.Session()
WORKING_REGION = my_session.region_name
print("Region:", WORKING_REGION)

## Create the resources for the workflow integration

Let's build a Step Functions workflow to run the QC-AFQMC simulation from the second lab in a compute pipeline. In lab 2 we already learned how to run the individual steps:
1. We generate matchgate shadows using quantum computing resource in an Amazon Braket hybrid job.
2. We run large-scale classical post-processing on the collected shadows in an AWS Batch job.
3. We have to collect the output files of the individual child jobs and reduce the samples containing the local energies.

While, in lab 2, we have executed the third step locally in our notebook, we now want to run this in the clou, too. This data reduction step in our example doesn't require a lot of compute resources and runtime is expected to be only a few minutes. For that purpose, we use __[AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html)__ that allows us to easily run code without provisioning or managing compute infrastructure. We just have to provide our source code and Lambda takes care of rest.

![](./images/workflow-architecture.png)

In [None]:
for output in cfn_client.describe_stacks(StackName='batch-environment').get('Stacks')[0].get('Outputs'):
    print(f"{output.get('OutputKey')}: {output.get('OutputValue')}")

<div class="alert alert-block alert-success">
<b>Activity:</b> Assign the stack outputs to the related variables below.
</div>

In [None]:
data_bucket_name = ""  # FIXME
lambda_image_repository_uri = ""  # FIXME

### Provide the Lambda function code

Similar to our Batch setup, we provide the code and runtime definition for our Lambda function with a Docker image which we build locally and upload to the dedicated image repository we have created earlier.

<div class="alert alert-block alert-success">
<b>Activity:</b> (Optional) Check out the content in the <code>lambda_container_image</code> directory to familiarize yourself with the data reduction logic we run on AWS Lambda.
</div>

In [None]:
%%time
import os

print("Authenticating Docker to your Amazon ECR private registry...")
os.system(f"aws ecr get-login-password --region {WORKING_REGION} | docker login --username AWS --password-stdin {ACCOUNT_ID}.dkr.ecr.{WORKING_REGION}.amazonaws.com")

print("Building your Docker image locally...")
os.system(f"docker build --quiet --platform linux/amd64 -f lambda_container_image/Dockerfile -t {lambda_image_repository_uri} .")

print("Pushing your Docker image to your ECR repository...")
os.system(f"docker push --quiet {lambda_image_repository_uri}")

print("All done.")

<div class="alert alert-block alert-success">
<b>Activity:</b> Navigate to the <a href="https://us-east-1.console.aws.amazon.com/ecr/private-registry/repositories">Amazon ECR mangement console</a> and check to see that the image has been successfully pushed to the repository. Note, that we are using a different repository than previously for our Batch image.
</div>

### Create the workflow

Like in the first lab, we use CloudFormation to create the AWS resources for our workflow.

In [None]:
batch_queue_arn = batch_client.describe_job_queues().get('jobQueues')[0].get('jobQueueArn')
batch_job_definition_arn = batch_client.describe_job_definitions(status='ACTIVE').get('jobDefinitions')[0].get('jobDefinitionArn')

with open('hybrid-workflow.yaml', 'r') as file:
    template_body = file.read()

stack_name = 'braket-batch-workflow'

try:
    print(f"Creating CloudFormation stack {stack_name}")
    cfn_client.create_stack(
        StackName=stack_name,
        TemplateBody=template_body,
        Parameters=[
            {'ParameterKey': 'BatchJobQueueArn', 'ParameterValue': batch_queue_arn},
            {'ParameterKey': 'BatchJobDefinitionArn', 'ParameterValue': batch_job_definition_arn},
            {'ParameterKey': 'DataBucket', 'ParameterValue': data_bucket_name},
            {'ParameterKey': 'LambdaFunctionImageUri', 'ParameterValue': lambda_image_repository_uri},
        ]
    )
    print("Waiting for CloudFormation stack to complete...")
    waiter = cfn_client.get_waiter("stack_create_complete")
    waiter.wait(
        StackName=stack_name,
        WaiterConfig={
            'Delay': 10,
            'MaxAttempts': 150
        }
    )
    print("CloudFormation stack completed.")
except cfn_client.exceptions.AlreadyExistsException:
    print("Stack already exists. Updating CloudFormation stack")
    try:
        cfn_client.update_stack(
            StackName=stack_name,
            TemplateBody=template_body,
            Parameters=[
                {'ParameterKey': 'BatchJobQueueArn', 'ParameterValue': batch_queue_arn},
                {'ParameterKey': 'BatchJobDefinitionArn', 'ParameterValue': batch_job_definition_arn},
                {'ParameterKey': 'DataBucket', 'ParameterValue': data_bucket_name},
                {'ParameterKey': 'LambdaFunctionImageUri', 'ParameterValue': lambda_image_repository_uri},
            ]
        )
        print("Waiting for CloudFormation stack to be updated...")
        waiter = cfn_client.get_waiter("stack_update_complete")
        waiter.wait(
            StackName=stack_name,
            WaiterConfig={
                'Delay': 10,
                'MaxAttempts': 150
            }
        )
        print("CloudFormation stack updated.")
    except cfn_client.exceptions.ClientError as e:
        print(e)

print()
for output in cfn_client.describe_stacks(StackName=stack_name).get('Stacks')[0].get('Outputs'):
    print(f"{output.get('OutputKey')}: {output.get('OutputValue')}")

<div class="alert alert-block alert-success">
<b>Activity:</b> As previously, navigate to the <a href="https://us-east-1.console.aws.amazon.com/cloudformation/home">AWS CloudFormation mangement console</a> to check the status of your stack and see the individual resources being created. Again, you may review the template <code>hybrid-workflow.yaml</code> to learn how we describe the resources for CloudFormation to create for us.
</div>

<div class="alert alert-block alert-success">
<b>Activity:</b> Navigate to the <a href="https://us-east-1.console.aws.amazon.com/states/home">AWS Step Functions mangement console</a> to check the definition of your workflow. See the screenshot below.
</div>

![](./images/stepfunctions-console.png)

<div class="alert alert-block alert-success">
<b>Activity:</b> Assign the stack outputs to the related variables below.
</div>

In [None]:
state_machine_arn = ""  # FIXME

### Provide the Braket hybrid jobs code

In the second lab, we have uploaded the code to generate matchgate shadows with quantum computing resources on the fly, when we create the hybrid job on Amazon Braket. This time, we have to upload a tarball of the code which we have referenced in our workflow definition (see line 46 in `hybrid-workflow.yaml`).

In [None]:
import tarfile
import tempfile
from pathlib import Path

abs_path = Path("afqmc").resolve(strict=True)

with tempfile.TemporaryDirectory() as temp_dir:
    filename = "afqmc.tar.gz"
    filepath = f"{temp_dir}/afqmc.tar.gz"
    print(f"Tarring up source code: {filepath}")
    with tarfile.open(filepath, "w:gz", dereference=True) as tar:
        tar.add(abs_path, arcname=abs_path.name)
    print(f"Uploading tarball to S3: {data_bucket_name}/{filename}") 
    s3_client.upload_file(filepath, data_bucket_name, filename)

## Execute the workflow described in Step Functions

Now we're set to start the execution of our workload.

In [None]:
response = sfn_client.start_execution(stateMachineArn=state_machine_arn)

execution_arn = response.get('executionArn')

You can check the status of the workflow execution in the Step Functions management console or programmatically.

<div class="alert alert-block alert-success">
<b>Activity:</b> Navigate to the <a href="https://us-east-1.console.aws.amazon.com/states/home">AWS Step Functions mangement console</a> and monitor the workflow execution. See the screenshot below.
</div>

![](./images/workflow-execution.png)

<div class="alert alert-block alert-success">
<b>Activity:</b> Wait until the workflow execution completes.
</div>

In [None]:
sfn_client.describe_execution(executionArn=execution_arn).get('status')

<div class="alert alert-block alert-info">
<b>Note:</b> It will take about 45 minutes for the entire workflow to complete. You can leave it running and come back later to retrieve the results.
</div>

## Retrieve results

Loading the results of our QC-AFQMC run is trivial this time. We only have to retrieve one single file.

In [None]:
import json
import numpy as np
from pathlib import Path

workflow_execution_status = sfn_client.describe_execution(executionArn=execution_arn).get('status')
if workflow_execution_status == "SUCCEEDED":
    Path("results/lab-3").mkdir(parents=True, exist_ok=True)
    s3_client.download_file(
        data_bucket_name,
        f"lambda/final_result.json",
        f"results/lab-3/result.json")
else:
    print(f"Your job is in status {workflow_execution_status}")

Let's plot the energies and compare to results from the classical AF-QMC simulation and the reference value, again.

In [None]:
import json
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
%matplotlib inline

with open(f"results/lab-3/result.json", "r") as file:
    data = json.load(file)

qc_energies = data["energies"]

local_energies_real = []
local_energies_imag = []
weights = []

for i in range(200):
    with open(f"results/lab-1/result_{i}.json", "r") as file:
        data = json.load(file)
    [local_energies_real.append(j) for j in data["local_energies_real"]]
    [local_energies_imag.append(j) for j in data["local_energies_imag"]]
    [weights.append(j) for j in data["weights"]]

local_energies = [[ii+1.j*jj for ii, jj in zip(i, j)] for i, j in zip(local_energies_real, local_energies_imag)]   
classical_energies = np.real(np.average(local_energies, weights=weights, axis=0))


plt.plot(
    0.005 * np.arange(600),
    classical_energies,
    linestyle="dashed",
    marker=".",
    color="tab:blue",
    label="classical",
)
plt.plot(
    0.005 * np.arange(200),
    qc_energies,
    linestyle="dashed",
    marker=".",
    color="tab:orange",
    label="quantum",
)
plt.axhline(-1.137117067345732, linestyle="dashed", color="black")
plt.title(r"Ground state estimation of H$_2$ using AFQMC", fontsize=16)
plt.legend(fontsize=14, loc="upper right")
plt.xlabel(r"$\tau$", fontsize=14)
plt.ylabel("Energy", fontsize=14)
plt.yticks(fontsize=14)
plt.tick_params(direction="in", labelsize=14)
plt.show()

## Wrapping up

<div class="alert alert-block alert-info">
<b>You reached the end of lab 3 and our tutorial. Well done!</b>
</div>

<div class="alert alert-block alert-success">
<b>Activity: Tell us how we did today.</b>

Please take <a href="https://pulse.aws/promotion/F9HYM7N0">this survey</a> and help us improve for similar tutorials at future events.

<b>In return for your participation you'll receive a promotion code with which you can redeem credits worth $20 in your AWS account.</b>
</div>