# Index

1. [Quick Intro](#1.-Quick-Intro)
2. [Setup](#2.-Setup)
3. [Cutting and Evaluating](#3.-Cut-and-Evaluation)
4. [Initializing CutQC object for Reconstruction](#4.-Initializing-CutQC-object-for-Reconstruction)

# 1. Quick Intro

CutQC is a 'full stack' circuit circuit cutting framework.  CutQC has three components: 1. Cut 2. Subcircuit Evaluation 3. Reconstruction. The last of which has a distrubuted implmenteation that can run on multi-node compute clusters. This document will discuss how to setup and run the distrubuted reconstruction component. All slurm and pyton excerpts shown in this notebook can be found in the examples directory.

# 2. Setup

This section shall briefly cover the environment variables and message passing backend required to run cutqc distrbuted reconstruction. 

At its core, a 'distributed application' is the execution of a program on a set of interconnected -- but independent -- machines that compute in parallel. Independent in this case means each machine executes a separate instance of the program with seperate execution contexts.

As a result, different compute nodes require a way to talk to each other. In many cases 'communication' largely can be broken down into two parts: 
  1. Initializatoin: Compute nodes need to 'find eachother, i.e a handshake. 
  2. Post-initialization: The manner in which data is actually being sent. This depends partially on available hardware. 

What all of this really means is when using CutQC, you must supply/set some extra information to faccilitate these two components. 

Note, Appart from that everything else is transparent.

During the execution of distrubted cutqc, this commucation is handled by the [commucation backend](https://pytorch.org/docs/stable/distributed.html), 

### Initialization

When running distributed reconstruction, you must start a separate instance of `cutqc` on each node. And as stated above, Initially, these instances will be unaware of each other's existence. However, `cutqc` relies on PyTorch to exchange connection information (handshake). Each node must have access to the following information  for the handshake:

- `MASTER_PORT`: Port on the host machine
- `MASTER_ADDR`: IP address of the host machine
- `WORLD_SIZE`: Total number of nodes involved
- `RANK`: Unique process identifier 

At the moment, this is done by setting this done by setting these values as environment variables on all machines involved and also passing it to cutqc when. 

More information about environment intialization can be [found here.](https://pytorch.org/docs/stable/distributed.html#environment-variable-initialization)


Thankfully a compute cluster scheduler like [slurm](https://slurm.schedmd.com/documentation.html) will automatically set the RANK WORLD_SIZE environement variables. Slurm won't automatically set the master_port and mastter_addr envrionment variables, however it does make it easy to do so. 

Bellow is an excerpt from the slurm script used in example:

In [None]:
#SBATCH --output=_output/%x.%j.out
#SBATCH --nodes=1                 # node count
#SBATCH --ntasks-per-node=1       # total number of tasks across all nodes
#SBATCH --cpus-per-task=13        # cpu-cores per task (>1 if multi-threaded tasks)

export MASTER_PORT=$(get_free_port)  
export MASTER_ADDR=$master_addr=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1) 

If done correctly, each node should be able to access port, address, world size information in the driver python by reading their environment variables. 

Bellow is an excerpt from the example driver that does this: 


In [None]:
# Title: 'dist_run.py'
import os

# Environment variables set by slurm script
gpus_per_node = int(os.environ["SLURM_GPUS_ON_NODE"])
WORLD_RANK = int(os.environ["SLURM_PROCID"])
WORLD_SIZE = int(os.environ["WORLD_SIZE"])
MASTER_RANK = 0

The world_rank and World_size values are passed as arguments to the CutQC constructor. This will be shown bellow in the last section.

If done correctly, each node should be able to access port, address and world size information in the driver python by reading their environment variables. 

Bellow is an excerpt from the example driver that does this: 


Then, in the driver python file, each node

Currently, CutQC requires 

The distributed reconstructor requires that the subcircuits be evaluated prior to its execution.

## Message Passing Backend

The steps shown above is strictly for initial coordination of nodes; A message passing backend is used to facitlate the 

# 3. Initializing CutQC object for Reconstruction

# 4. Cut and Evaluation