# Simcal
Simcal is a simulation calibration framework designed to calibrate arbitrary simulators.  The simulator itself is defined outside of simcal and can be anything.  For the purposes of this walkthrough, we will assume the simulator is some other program that can be invoked from the command line and outputs a value.

Simcal provides a Simulation wrapper that must be implemented to call this simulator.  This wrapper must return a scalar value representing the loss of the calibration.  It is helpful (but not required) for this to also be a function.

The sklearn.metrics package provides many useful error functions that can be used.  For this example, we will use mean_squared_error

In [1]:
from sklearn.metrics import mean_squared_error as sklearn_mean_squared_error
import simcal as sc

# Ground-Truth data
Simcal has no expectation for your ground-truth data.  It doesnt even assume it exists.  Some simulators can use another source for their accuracy.

However, for our simulators, ground-truth data is needed for calibration.  We recommend this data is stored in a well structured directory of multiple scenarios and contain both the required arguments for the simulator a scenario to run and the expected output of a scenario.

Example Dir:
* ground_truth
	+ single_machine
		- 10_tasks_0_data.json
		- 10_tasks_10_data.json
		- 10_tasks_100_data.json
		- 100_tasks_0_data.json
		- 100_tasks_10_data.json
		- 100_tasks_100_data.json
	+ two_machine
		- ...
	+ four_machine
		- ...

Example 10_tasks_100_data.json:
```json
{
	"makespan":10s,
	"tasks":10,
	"data":100
}
```
This organized data is then loaded into a well structured data structure

Example:
```json
{
	"single_machine":{
		"10_tasks_0_data":{
			"makespan":10s,
			"tasks":10,
			"data":0
		},...
	},
	"two_machine":{
		...
	},
	"four_machine":{
		...
	}
}
```
we assume `ground_truth_loader(path)` produces such a structure


In [None]:
ground_truth=ground_truth_loader("ground/truth/path")

# Simulator
Simcal provides an abstract `Simulator` class to extend for creating the wrapper.  The `run` method of this wrapper must have the signature  `run(self, env, args)` and return a scalar value.  The ``__call__`` method is reserved.  Otherwise the implementation of this class is up to the user.  For this example, we will define a simulator that takes the path to the simulator, a reference ground-truth dataset, and a functor to evaluate the loss.  The `run` function will run all scenarios given in the ground_truth, and then compute the loss over them.  The `run` function will be provided with a dictionary of formated parameter values to use in the `args` parameter.


In [None]:
class ExampleSimulator(sc.Simulator):
    def __init__(self, simulator_path, ground_truth, loss):
        self.simulator_path = simulator_path
        self.ground_truth = ground_truth
        self.loss = loss
    # Assume args is
    # {
    #     "network_speed":ParameterValue,
    #     "cpu_speed":ParameterValue
    # }
    def run(self, env, args):
        result=[]
        makespans=[]
        for machine_count in self.ground_truth:
            for scenario in self.ground_truth[machine_count]:
                gtdata = self.ground_truth[machine_count][scenario]
                std_out, std_err, exit_code = env.bash(self.simulator_path,
                                                       "--tasks",gtdata["tasks"],
                                                       "--data",gtdata["data"], 
                                                       "--network",args["network_speed"],
                                                       "--cpu",args["cpu_speed"])
                if std_err: # This if is not required, but is helpful for debugging simulators, be ware of printing large outputs
                    print(gtdata, std_out, std_err, exit_code)
                    raise Exception(f"Error running simulator") # any normal exception raised in the run function will cause the calibration process to stop
                resutls.add(float(std_dout))
                makespans.add(parse_united_float(gtdata["makespan"]))
        return self.loss(makespan,results)
        

`env` provides an environment unique to each invocation of `run`.  It provides many useful features such as temporary file and directory handling, as well as a bash function.  If your simulator requires input files for some of its arguments `open_file=env.tmp_file()` will create a temporary file to write it to, and if your simulator produces functions as outputs `env.tmp_dir()` will make a temporary folder for `env.bash` to use as a cwd.

# Parameter Values
Parameter Values are given to the `run` function in various formats depending on how they are configured.  Most often, they are provided as numeric values with a format in a `sc.parameter.Value`.  These support arithmetic options as if they are numbers, but will automatically attach a unit when cast to a string, jsonified, or passed to `env.bash`.  They can be used to access the base parameter distribution they originate from if required allowing them to carry additional metadata.

In [5]:
parameter=sc.parameter.Value("%.1fMbps",10.0,None)# For demonstration purposes, we manually create a parameter value with a unit of Mbps, value 10, and no base parameter
print(parameter) # 10.0Mbps

parameter *= 10
print(parameter) # 100.0Mbps

print(float(parameter)) # 100.0

parameter.value = 5
print(parameter) # 5.0Mbps

10.0Mbps
100.0Mbps
100.0
5.0Mbps


# Calibrator
Simcal provides a standard calibration wrapper