# SQL to circuit ansätze

This notebook implements the part of the algorithm which translates join order benchmark (JOB) and the simplified join order benchmark queries into pregroup diagrams and pregroup diagrams into circuit ansätze. The simple example can be found in `sql_to_circuit_simple_example` notebook.

The following code generates diagrams for all the SELECT-FROM-WHERE queries in the join order benchmark and their simplified versions. Running the code will take some time and it also works as a test package for the code. The diagrams are already generated in the folders `join-order-benchmark-diagrams` and `simplified-JOB-diagrams`.

Unfortunalyte, JOB queries produce too large circuits for quantum computing resources that we have available. That is why we created the set of simplified queries. As in the data generation case, this notebook is for reproducibility reasons and the user does not need to rerun this if they do not want to change the underlying queries or the mappings.

In [2]:
import json
import os
import glob
from pathlib import Path
from discopy import Ty, Box, Functor
from functools import reduce
from discopy.utils import dumps, loads
from multiprocessing import Pool
from math import ceil

import diagramGenerators

this_folder = os.path.abspath(os.getcwd())
num_processors = 8

NameError: name 'Lexer' is not defined

In [2]:
query_path_job = "\\join-order-benchmark-queries\\[0-9]*.sql"
cfg_folder_name_job = "join-order-benchmark-diagrams//cfg-diagrams"
pregroup_folder_job = "join-order-benchmark-diagrams//pregroup-diagrams"
pregroup_cup_removed_folder_job = "join-order-benchmark-diagrams//cup-removed-pregroup-diagrams"
circuit_job = "join-order-benchmark-diagrams//circuits"

In [3]:
query_path_training = "\\queries\\training_queries\\[0-9]*.sql"
query_path_validation = "\\queries\\validation_queries\\[0-9]*.sql"
query_path_test = "\\queries\\test_queries\\[0-9]*.sql"

cfg_folder_name_training = "simplified-JOB-diagrams//cfg-diagrams//training"
cfg_folder_name_validation = "simplified-JOB-diagrams//cfg-diagrams//validation"
cfg_folder_name_test = "simplified-JOB-diagrams//cfg-diagrams//test"

pregroup_folder_training = "simplified-JOB-diagrams//pregroup-diagrams//training"
pregroup_folder_validation = "simplified-JOB-diagrams//pregroup-diagrams//validation"
pregroup_folder_test = "simplified-JOB-diagrams//pregroup-diagrams//test"

pregroup_cup_removed_folder_training = "simplified-JOB-diagrams//cup-removed-pregroup-diagrams//training"
pregroup_cup_removed_folder_validation = "simplified-JOB-diagrams//cup-removed-pregroup-diagrams//validation"
pregroup_cup_removed_folder_test = "simplified-JOB-diagrams//cup-removed-pregroup-diagrams//test"

circuit_training = "simplified-JOB-diagrams//circuits//binary_classification//training"
circuit_validation = "simplified-JOB-diagrams//circuits//binary_classification//validation"
circuit_test = "simplified-JOB-diagrams//circuits//binary_classification//test"

In [4]:
query_path_training = "\\queries\\small\\training_queries\\[0-9]*.sql"
query_path_validation = "\\queries\\small\\validation_queries\\[0-9]*.sql"
query_path_test = "\\queries\\small\\test_queries\\[0-9]*.sql"

cfg_folder_name_training = "simplified-JOB-diagrams//small//cfg-diagrams//training"
cfg_folder_name_validation = "simplified-JOB-diagrams//small//cfg-diagrams//validation"
cfg_folder_name_test = "simplified-JOB-diagrams//small//cfg-diagrams//test"

pregroup_folder_training = "simplified-JOB-diagrams//small//pregroup-diagrams//training"
pregroup_folder_validation = "simplified-JOB-diagrams//small//pregroup-diagrams//validation"
pregroup_folder_test = "simplified-JOB-diagrams//small//pregroup-diagrams//test"

pregroup_cup_removed_folder_training = "simplified-JOB-diagrams//small//cup-removed-pregroup-diagrams//training"
pregroup_cup_removed_folder_validation = "simplified-JOB-diagrams//small//cup-removed-pregroup-diagrams//validation"
pregroup_cup_removed_folder_test = "simplified-JOB-diagrams//small//cup-removed-pregroup-diagrams//test"

circuit_training = "simplified-JOB-diagrams//small//circuits//binary_classification//training"
circuit_validation = "simplified-JOB-diagrams//small//circuits//binary_classification//validation"
circuit_test = "simplified-JOB-diagrams//small//circuits//binary_classification//test"

In [5]:
def split(list_a, chunk_size):
    if list_a == []:
        return []
    for i in range(0, len(list_a), chunk_size):
        yield list_a[i:i + chunk_size]

## Transformation 1: SQL to context-free grammar diagrams

The following cells execute the transformations in parallel.

In [6]:
queries_training = glob.glob(this_folder + query_path_training)

if __name__ ==  '__main__':
    chunks = split(queries_training, ceil(len(queries_training)/num_processors))
    p = Pool(processes = num_processors)
    p.starmap(diagramGenerators.create_CFG_diagrams, [(chunk, cfg_folder_name_training) for chunk in chunks])
    p.close()
    p.join()

In [7]:
queries_validation = glob.glob(this_folder + query_path_validation)

if __name__ ==  '__main__':
    chunks = split(queries_validation, ceil(len(queries_validation)/num_processors))
    p = Pool(processes = num_processors)
    p.starmap(diagramGenerators.create_CFG_diagrams, [(chunk, cfg_folder_name_validation) for chunk in chunks])
    p.close()
    p.join()

In [8]:
queries_test = glob.glob(this_folder + query_path_test)

if __name__ ==  '__main__':
    chunks = split(queries_test, ceil(len(queries_test)/num_processors))
    p = Pool(processes = num_processors)
    p.starmap(diagramGenerators.create_CFG_diagrams, [(chunk, cfg_folder_name_test) for chunk in chunks])
    p.close()
    p.join()

## Transformation 2: Context-free grammar diagrams to pregroup grammar diagrams

The following cells execute the transformations in parallel.

In [9]:
cfg_diagrams_training = glob.glob(this_folder + "\\" + cfg_folder_name_training + "\\[0-9]*.json")

if __name__ ==  '__main__':
    chunks = split(cfg_diagrams_training, ceil(len(cfg_diagrams_training)/num_processors))
    p = Pool(processes = num_processors)
    p.starmap(diagramGenerators.create_pregroup_grammar_diagrams, [(chunk, pregroup_folder_training) for chunk in chunks])
    p.close()
    p.join()

In [10]:
cfg_diagrams_validation = glob.glob(this_folder + "\\" + cfg_folder_name_validation + "\\[0-9]*.json")

if __name__ ==  '__main__':
    chunks = split(cfg_diagrams_validation, ceil(len(cfg_diagrams_validation)/num_processors))
    p = Pool(processes = num_processors)
    p.starmap(diagramGenerators.create_pregroup_grammar_diagrams, [(chunk, pregroup_folder_validation) for chunk in chunks])
    p.close()
    p.join()

In [11]:
cfg_diagrams_test = glob.glob(this_folder + "\\" + cfg_folder_name_test + "\\[0-9]*.json")

if __name__ ==  '__main__':
    chunks = split(cfg_diagrams_test, ceil(len(cfg_diagrams_test)/num_processors))
    p = Pool(processes = num_processors)
    p.starmap(diagramGenerators.create_pregroup_grammar_diagrams, [(chunk, pregroup_folder_test) for chunk in chunks])
    p.close()
    p.join()

## Transformation 3: pregroup diagram rewriting, cup removal and simplification

The following cells execute the transformations in parallel.

In [12]:
pregroup_diagrams_training = glob.glob(this_folder + "\\" + pregroup_folder_training + "\\[0-9]*.json")

if __name__ ==  '__main__':
    chunks = split(pregroup_diagrams_training, ceil(len(pregroup_diagrams_training)/num_processors))
    p = Pool(processes = num_processors)
    p.starmap(diagramGenerators.remove_cups_and_simplify, [(chunk, pregroup_cup_removed_folder_training) for chunk in chunks])
    p.close()
    p.join()

In [13]:
pregroup_diagrams_validation = glob.glob(this_folder + "\\" + pregroup_folder_validation + "\\[0-9]*.json")

if __name__ ==  '__main__':
    chunks = split(pregroup_diagrams_validation, ceil(len(pregroup_diagrams_validation)/num_processors))
    p = Pool(processes = num_processors)
    p.starmap(diagramGenerators.remove_cups_and_simplify, [(chunk, pregroup_cup_removed_folder_validation) for chunk in chunks])
    p.close()
    p.join()

In [14]:
pregroup_diagrams_test = glob.glob(this_folder + "\\" + pregroup_folder_test + "\\[0-9]*.json")

if __name__ ==  '__main__':
    chunks = split(pregroup_diagrams_test, ceil(len(pregroup_diagrams_test)/num_processors))
    p = Pool(processes = num_processors)
    p.starmap(diagramGenerators.remove_cups_and_simplify, [(chunk, pregroup_cup_removed_folder_test) for chunk in chunks])
    p.close()
    p.join()

## Transformation 4: pregroup diagrams to circuit ansätze

The following cells execute the transformations in parallel.

In [15]:
pregroup_diagrams_training = glob.glob(this_folder + "\\" + pregroup_cup_removed_folder_training + "\\[0-9]*.json")

if __name__ ==  '__main__':
    chunks = split(pregroup_diagrams_training, ceil(len(pregroup_diagrams_training)/num_processors))
    p = Pool(processes = num_processors)
    p.starmap(diagramGenerators.create_circuit_ansatz, [(chunk, circuit_training) for chunk in chunks])
    p.close()
    p.join()

In [16]:
pregroup_diagrams_validation = glob.glob(this_folder + "\\" + pregroup_cup_removed_folder_validation + "\\[0-9]*.json")

if __name__ ==  '__main__':
    chunks = split(pregroup_diagrams_validation, ceil(len(pregroup_diagrams_validation)/num_processors))
    p = Pool(processes = num_processors)
    p.starmap(diagramGenerators.create_circuit_ansatz, [(chunk, circuit_validation) for chunk in chunks])
    p.close()
    p.join()

In [17]:
pregroup_diagrams_test = glob.glob(this_folder + "\\" + pregroup_cup_removed_folder_test + "\\[0-9]*.json")

if __name__ ==  '__main__':
    chunks = split(pregroup_diagrams_test, ceil(len(pregroup_diagrams_test)/num_processors))
    p = Pool(processes = num_processors)
    p.starmap(diagramGenerators.create_circuit_ansatz, [(chunk, circuit_test) for chunk in chunks])
    p.close()
    p.join()