# PoissonML Notebook
This Jupyter notebook serves as the Python sandbox for developing the ML model. 

### Simulation Runs and Generating Data Files

The cell below compiles the C++ script with optimization flags. Make sure to pass the correct location for `mpic++`, it may be different on your machine.

In [2]:
!/usr/lib64/openmpi/bin/mpic++ -O3 -march=native -funroll-loops -o domain_decomp_heat domain_decomp_heat.cpp

The snippet below populates the `generated_data/` directory with text output files for each simulation. The first and second parameter to `run_mpi_simulation` control the number of processors dedicated to running the C++ file and how many simulations to run, respectively.

For example, `run_mpi_simulation(4, 100)` uses 4 processors and runs 100 simulations.

In [None]:
import subprocess
import sys
import time

def run_mpi_simulation(num_processes, num_simulations):
    print(f"Starting {num_simulations} simulations with {num_processes} processes...")
    start_time = time.time()
    
    process = subprocess.Popen(['mpirun', '-np', str(num_processes), 
                               '--bind-to', 'core', './domain_decomp_heat', str(num_simulations)], 
                              stdout=subprocess.PIPE, 
                              stderr=subprocess.STDOUT, 
                              universal_newlines=True)
    
    for line in process.stdout:
        print(line, end='')
        sys.stdout.flush()
    
    end_time = time.time()
    print(f"\nCompleted in {end_time - start_time:.2f} seconds")
    return process.returncode

run_mpi_simulation(4, 100)

Starting 100 simulations with 4 processes...
Running 100 simulations with 4 MPI processes
Output will be saved to generated_data/ directory
Domain decomposition: 4 processes, 250 rows per process
Starting simulation 1/100
Simulation 1: iteration 1000
Simulation 1: iteration 2000
Simulation 1: iteration 3000
Simulation 1: iteration 4000
Simulation 1: iteration 5000
Simulation 1: iteration 6000
Simulation 1: iteration 7000
Simulation 1: iteration 8000
Simulation 1: iteration 9000
Simulation 1: iteration 10000
Simulation 1: iteration 11000
Simulation 1: iteration 12000
Simulation 1: iteration 13000
Simulation 1: iteration 14000
Simulation 1: iteration 15000
Simulation 1: iteration 16000
Simulation 1: iteration 17000
Simulation 1: iteration 18000
Simulation 1: iteration 19000
Simulation 1: iteration 20000
Simulation 1: iteration 21000
Simulation 1: iteration 22000
Simulation 1: iteration 23000
Simulation 1: iteration 24000
Simulation 1: iteration 25000
Simulation 1: iteration 26000
Simulat

### Making Tensors from Data Files
We need a way of making input tensors for the machine learning model from the generated `.txt` files. Below is a function that scrapes the files and finds the value of a given file at a specified point. This is necessary because the value at that point will be the target of the machine learning model when given the charge locations, magnitudes, and a query point as feature vectors.

In [5]:
import torch

torch.set_printoptions(precision=9) ## match the precision of the .txt files

def read_simulation_file(filepath, query_point):
    with open(filepath, 'r') as file:
        lines = file.readlines()

    meta_parts = lines[0].strip().split()
    if meta_parts[0] != "Simulation":
        raise ValueError("Invalid file format: Missing 'Simulation' keyword on first line")

    charge_info = list(map(float, meta_parts[3:15]))
    input_tensor = torch.tensor(charge_info + list(query_point), dtype=torch.float32)

    grid_lines = lines[2:]
    grid_data = [list(map(float, line.strip().split())) for line in grid_lines]
    grid_tensor = torch.tensor(grid_data, dtype=torch.float32)

    y, x = query_point
    target_value = grid_tensor[y, x]

    return input_tensor, target_value

Verify that the function works by looking at a file or two...

In [6]:
input_tensor, target_value = read_simulation_file("generated_data/simulation_1.txt", query_point=(1, 1))
print("Input Tensor:", input_tensor)
print("Target Value:", target_value)

Input Tensor: tensor([1000.000000000,  700.000000000,  500.000000000,   52.907398224,
         300.000000000,  400.000000000,   14.718600273,  200.000000000,
         800.000000000,   76.158996582,  500.000000000,  100.000000000,
           1.000000000,    1.000000000])
Target Value: tensor(0.000168939)


In [7]:
input_tensor, target_value = read_simulation_file("generated_data/simulation_64.txt", query_point=(2, 2))
print("Input Tensor:", input_tensor)
print("Target Value:", target_value)

Input Tensor: tensor([1000.000000000,  700.000000000,  500.000000000,   68.822601318,
         300.000000000,  400.000000000,   40.139701843,  200.000000000,
         800.000000000,   94.590499878,  500.000000000,  100.000000000,
           2.000000000,    2.000000000])
Target Value: tensor(0.000960697)


In [8]:
import os

def load_dataset(directory, query_points):
    inputs = []
    targets = []
    
    for filename in os.listdir(directory):
        if filename.endswith(".txt"):
            filepath = os.path.join(directory, filename)
            for qp in query_points:
                input_tensor, target = read_simulation_file(filepath, qp)
                inputs.append(input_tensor)
                targets.append(target)
    
    X = torch.stack(inputs)
    y = torch.stack(targets)
    return X, y

In [None]:
query_points = [(x, y) for x in range(0, 1000, 100) for y in range(0, 1000, 100)]
X, y = load_dataset("generated_data", query_points)

## TODO
### Model Creation

As a prototype, a Kernel Ridge Regression (KRR) model will be used to solve for the electric potential map given four point charges. The four point charges will have the same location each time, but with random magnitude. More advanced problems to tackle include randomizing the locations of the point charges, and introducing additional point charges.

Basic KRR scales poorly "due to a $O(n^2)$ memory requirement and $O(n^3)$ arithmetic operations" according to [this paper](https://arxiv.org/pdf/1805.00569). When looking for more modern, ready-to-go scalable implementations of KRR, [`ASkotch`](https://github.com/pratikrathore8/fast_krr) was found and will be used in this notebook.