# Quantum Computing -based Optimization for Sustainable Data Workflows in Cloud Infrastructures

by [Valter Uotila](https://researchportal.helsinki.fi/en/persons/valter-johan-edvard-uotila), PhD student, [Unified Database Management Systems](https://www2.helsinki.fi/en/researchgroups/unified-database-management-systems-udbms/news), University of Helsinki

This is necessarily just a specified and modified the shortest path finding applied to the problem presented in the document that comes along with the implementation.

Possible quantum software-harware combinations to solve the problem:

1. Amazon Braket's D-Wave — Advantage and D-Wave — 2000Q
    1. Ocean implementation of this code
2. Amazon Braket's IonQ and Rigetti machines
    1. Qiskit implementation of this code
3. Amazon Braket's simulators
    1. Qiskit implementation of this code
4. D-wave's Leap Advantage
    1. Ocean implementation of this code
5. IBM Quantum systems
    1. Qiskit implementation of this code
6. Local machine
    1. Both Ocean and Qiskit versions

Because I am familiar with the Ocean framework and it is specially designed for formulating QUBOs, I initially formulated the problem using it.

In [1]:
import dimod
from dimod.generators.constraints import combinations

from dwave.system import LeapHybridSampler
from hybrid.reference import KerberosSampler

import json
import itertools
import os
import math

from ipynb.fs.defs.emission_simulator import emission_simulator

notebook_path = os.path.abspath("main.ipynb")

def append_linear_safe(variable, value, linear_dict):
    if variable in linear_dict.keys():
        linear_dict[variable] = linear_dict[variable] + value
    else:
        linear_dict[variable] = value

def append_quadratic_safe(variable, value, quadratic_dict):
    if variable in quadratic_dict.keys():
        quadratic_dict[variable] = quadratic_dict[variable] + value
    else:
        quadratic_dict[variable] = value

## Importing data

In [2]:
cloud_partners_file_path = os.path.join(os.path.dirname(notebook_path), "data/cloud_partners.json")
f = open(cloud_partners_file_path)
partners_root = json.load(f)
cloud_partners = partners_root["cloud_partners"]

workload_name = "workload1.json"
workload_file_path = os.path.join(os.path.dirname(notebook_path), "data/workloads/" + workload_name)
f = open(workload_file_path)
workload_root = json.load(f)
workload = workload_root["workload"]

#print(cloud_partners)
#print(workload)

## Creating variables for the binary quadratic model

We defined variables to be $ x_{i,j} = (w_i, d_j) $.

In [3]:
vartype = dimod.BINARY
bqm = dimod.BinaryQuadraticModel({}, {}, 0.0, vartype)
variables = dict()
workload_order = []

# We assume that any work can be executed on any data center
for work in workload:
    variables[str(work["work_id"])] = list()
    workload_order.append(str(work["work_id"]))
    for partner in cloud_partners:
        for center in partner["data_centers"]:
            # The each key in the variables dictionary corresponds to a level in a tree i.e. a time step in the workflow
            variables[str(work["work_id"])].append((str(work["work_id"]), center["center_id"]))
            
#print(json.dumps(variables, indent=1))

## Constructing constraints 

### Constraint 1

This constraint implements the requirement that for every work $ w_i $ we have exactly one variable $ x_{i,j} = (w_i, d_j) = 1$. In other words, this means that every work is executed on a single data center.

In [4]:
strength = 3.0
for work_id in variables:
    one_work_bqm = combinations(variables[work_id], 1, strength=strength)
    bqm.update(one_work_bqm)

### Constraint 2

This constraint implements the requirement that for every pair of variables $x_{i,j} = (w_i, d_j)$ and $x_{i+1,k} = (w_{i+1}, d_k)$ we associate the (estimated emission) coefficient $e(x_{i,j}, x_{i+1,k})$. This coefficient is calculated in emission_simulator function. Note that we need to calculate this only for those pairs, where the works $w_i$ and $w_{i+1}$ are consecutive works in the workload.

In [5]:
A = 1
linear = dict()
quadratic = dict()
offset = 0.0

for work_id_current in range(len(workload_order) - 1):
    work_id_next = work_id_current + 1
    
    key_current = workload_order[work_id_current]
    key_next = workload_order[work_id_next]
    
    for work1 in variables[key_current]:
        for work2 in variables[key_next]:
            coeff = emission_simulator(work1, work2)
            #print("Works", work1, work2)
            #print("Coefficient", coeff)
            append_quadratic_safe((work1, work2), coeff, quadratic)
    
bqm_c2 = dimod.BinaryQuadraticModel(linear, quadratic, offset, vartype)
bqm_c2.scale(A)
bqm.update(bqm_c2)
#print(bqm)
#print(bqm.to_numpy_vectors())

BinaryQuadraticModel({('0', '00'): -3.0, ('0', '10'): -3.0, ('0', '20'): -3.0, ('0', '30'): -3.0, ('0', '40'): -3.0, ('1', '00'): -3.0, ('1', '10'): -3.0, ('1', '20'): -3.0, ('1', '30'): -3.0, ('1', '40'): -3.0, ('2', '00'): -3.0, ('2', '10'): -3.0, ('2', '20'): -3.0, ('2', '30'): -3.0, ('2', '40'): -3.0, ('3', '00'): -3.0, ('3', '10'): -3.0, ('3', '20'): -3.0, ('3', '30'): -3.0, ('3', '40'): -3.0, ('4', '00'): -3.0, ('4', '10'): -3.0, ('4', '20'): -3.0, ('4', '30'): -3.0, ('4', '40'): -3.0, ('5', '00'): -3.0, ('5', '10'): -3.0, ('5', '20'): -3.0, ('5', '30'): -3.0, ('5', '40'): -3.0}, {(('0', '10'), ('0', '00')): 6.0, (('0', '20'), ('0', '00')): 6.0, (('0', '20'), ('0', '10')): 6.0, (('0', '30'), ('0', '00')): 6.0, (('0', '30'), ('0', '10')): 6.0, (('0', '30'), ('0', '20')): 6.0, (('0', '40'), ('0', '00')): 6.0, (('0', '40'), ('0', '10')): 6.0, (('0', '40'), ('0', '20')): 6.0, (('0', '40'), ('0', '30')): 6.0, (('1', '00'), ('0', '00')): 1.0, (('1', '00'), ('0', '10')): 1.0, (('1', '00

## Updating BQM while workflow proceeds

To make the problem and solution less non-trivial, we include the time component in the algorithm. One time step means that we have executed a single work on some of the data centers. At each time step, we check the current situation how sustainable way the data centers are running. For example, weather conditions (wind and amount of water in rivers, etc.) affect the production of green energy, and the data center's machines' characteristics determine part of the emissions. In real-life cases, the other workloads affect the decision, and we might need to switch to another data center. This demonstration possibly modifies these conditions more than they vary in real-life but this demonstrates how the algorithm works.

In [6]:
def update(bqm):
    return None

## Demonstrating algorithm

In [7]:
#bqm.normalize()
#print(bqm)

kerberos_sampler = KerberosSampler().sample(bqm, max_iter=10, convergence=3, qpu_params={'label': 'Data workflow optimization'})
sample = kerberos_sampler.first.sample

#sampler = LeapHybridSampler()
#sampleset = sampler.sample(bqm)
#sample = sampleset.first.sample

#print(sampleset)
#print(best_solution)
#sample = best_solution
print()
# energy = sampleset.first.energy
print("Result")
i = 0
for varname, value in sample.items():
    if value == 1:
        i+=1
        print(varname, value)
#print(i)


Result
('0', '20') 1
('1', '30') 1
('2', '30') 1
('3', '40') 1
('4', '00') 1
('5', '30') 1


## Transfering problem to Qiskit

In this part of the code I rely on the [Qiskit Tutorials](https://qiskit.org/documentation/optimization/tutorials/10_warm_start_qaoa.html).

### Importing Qiskit and Amazon braket

In [8]:
from qiskit import IBMQ

provider = IBMQ.load_account()
backend = provider.get_backend('ibmq_qasm_simulator')

### Transforming QUBO in Ocean to QUBO in Qiskit 

In [None]:
# TODO